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Preface to the 
Princeton Science Library Edition 


V V riting to an American friend in 1901, Bertrand Russell observed: 


To me, pure mathematics is one of the highest forms of art; it 
has a sublimity quite special to itself, and an immense dignity 
derived from the fact that its world is exempt from change and 
time. I am quite serious about this . . . [M]athematics is the 
only thing we know of that is capable of perfection; in thinking 
about it, we become Gods. [1] 


These striking words suggest why Russell would win the Nobel Prize for 
Literature in 1950. More to the point, they resonate with mathematicians 
everywhere. We recognize our subject as a kind of art. 

Mathematical theories, of course, do not arise in a perfect state. They 
need repeated modification. And if the fight for a mature mathematical 
theory is bloodless, it is particularly demanding because it is waged not on 
the battlefield but in the mind. When everything falls into place, however, 
the outcome can exhibit Russell’s god-like perfection. Nowhere, it seems 
to me, is this more evident than in the rigorous theory of calculus as devel- 
oped between the late seventeenth and the early twentieth centuries. That 
is the story I wanted to tell in The Calculus Gallery. 

To do this, I opted against a comprehensive survey of three centuries 
of mathematics. Instead, I chose to approach the calculus by presenting a 
few theorems from some of the greatest innovators the subject can boast. 
The idea was suggested by an art gallery, where the visitor sees a few paint- 
ings from a selection of brilliant artists. By analogy, mine would be a gal- 
lery of calculus. 

Tam thrilled that the book, originally published in 2005, is now being 
re-issued as part of the Princeton Science Library. This series contains 
such classics as George Pélya’s How to Solve It and Edwin Abbott’ Flat- 
land, not to mention works by Richard Feynman and Albert Einstein. It is 
humbling to find one’s own writing in such company. 

And it is all the more gratifying because my book is what we in math- 
ematics call “non-trivial.” (Most people find this an odd turn of phrase, 
rather like calling a desert “non-wet.”) As noted, The Calculus Gallery 
examines theorems from history’s foremost mathematicians, and, to tell 
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the story honestly, I necessarily tackled some non-trivial—i.e., “hard’— 
ideas. I did my best to make the discussion accessible, but a background 
in analysis is surely helpful in following these arguments and appreciating 
their significance. 

In the new edition, changes to the original are minimal. I made some 
tweaks here and there and added this preface. But — as we math historians 
happily remind ourselves — the Newtons and Eulers of the world haven't 
proved any new theorems lately, so there is no pressing need to update 
their outputs. Instead, let me offer a thought about three of the mathema- 
ticians discussed in this book. 

Back when everything was still in manuscript form, I recall talking 
with friends about my plans. We all agreed that some mathematicians 
were so influential, so famous, that they simply had to be included. A 
book tracing the history of analysis could not omit Cauchy or Weierstrass, 
any more than a book tracing the history of basketball could omit Bill 
Russell or Magic Johnson. But one of my friends raised a question that 
initially took me aback: who is the least famous mathematician to appear 
in my gallery? 

After some thought, I decided it was René Baire, who occupies the 
book’s next-to-last chapter. A graduate student in topology will come 
upon his name at some point, but it probably will fly past rather late in 
the semester and be attached to a fairly abstract topological concept. 
Even among seasoned mathematicians, Baire’s name is not a household 
word. Yet I found his contributions to be more than worthy of a chapter, 
and reader feedback over the years has tended to endorse my decision. 
Why so? 

My answer is that Baire effected a brilliant fusion of calculus and set 
theory. Over time, the central questions of calculus had become central 
questions about functions — for example, when is a discontinuous func- 
tion integrable? It was Baire who pushed this a step further in asserting 
that “. . . any problem relative to the theory of functions leads to certain 
questions relative to the theory of sets.” [2] In resolving the latter, he could 
resolve the former. 

Set theory had appeared in the last decades of the nineteenth century. 
It was introduced by Georg Cantor and advanced by people like his young 
disciple, Vito Volterra. | discuss their mathematics in chapters 11 and 12, 
respectively. But no one at the time more effectively applied set theory to 
address the problems of analysis than did René Baire. He left us with two 
ideas that could only have come from the most perceptive of mathemati- 
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cians: the Baire category theorem and the Baire classification of functions. 
To this day, they provide deep insights into functions and their behavior. | 
believe that, with the publication of Baire’s results, “modern” analysis had 
really arrived. You can judge for yourself in chapter 13. 

Given the focus of this book, I had to become familiar with primary 
sources, not just from Baire but from all the mathematicians in my pan- 
theon. | already knew some of their work, but I had much to learn. Grap- 
pling with original mathematics was one of the pleasures of writing The 
Calculus Gallery. The more I knew, the more my opinion of each mathe- 
matician rose, but two individuals particularly distinguished themselves 
in my esteem: Augustin-Louis Cauchy (chapter 6) and Henri Lebesgue 
(chapter 14). 

Cauchy was the critical figure in transforming the early, intuitive cal- 
culus into the rigorous subject of today. It is no exaggeration to say that he 
changed things forever. In particular, it was Cauchy who made “limit” the 
central idea of analysis. In chapters 1-5 of this book, the reader will see 
some great mathematics created by some great mathematicians, but limits 
are missing in action. Cauchy introduced them (albeit not in a modern 
way) and used them as the foundation for other analytic ideas. For him, 
derivatives, integrals, and sums of infinite series were defined using the 
limit concept. It was Lebesgue who noted that “Before Cauchy there was 
no definition of the integral in the modern meaning of the word ‘defini- 
tion.” [3] That is because, before Cauchy, limits were not in the mathema- 
tician’s arsenal. 

Moreover, Cauchy realized that these basic definitions must underpin 
the theorems of analysis. This would require a logical development every 
bit as precise as Euclid’s approach to geometry from twenty centuries ear- 
lier. As an illustration, I have included Cauchy’s proof of the intermediate 
value theorem for continuous functions. This result guarantees that a con- 
tinuous function that goes from a negative to a positive value must some- 
where equal zero. This seems self-evident, and earlier mathematicians had 
taken it on faith. By contrast, Cauchy realized it was a theorem to be 
proved, not a principle to be assumed. His proof remains a masterpiece, as 
you'll see in chapter 6. 

So, in writing this book, Cauchy became one of my two favorite ana- 
lysts. The other was the aforementioned Henri Lebesgue. 

He is the last mathematician treated in the book and for good rea- 
son: Lebesgue resolved so many open questions in analysis. In the pro- 
cess, he defined what we now call Lebesgue measure and the Lebesgue 
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integral. He did this, in the apt words of Paul Montel “. . . by looking at 
old things with new eyes.” [4] Lebesgue’s work was stunningly original, 
and, as Michel Loéve observed, today’s analysis “still dances to Lebes- 
gue’s tunes.” [5] 

It is true that Lebesgue did not have the range of achievement of an 
Euler or a Cauchy. These two contributed to virtually every branch of 
mathematics, pure and applied. With his theory of measure and the inte- 
gral, Lebesgue was more of a specialist. But G. H. Hardy, no mean analyst 
himself, put it this way in his obituary of Lebesgue: 


He was rather a man with one outstanding claim to fame. . . all 
his secondary work, of which there is not much, is overshad- 
owed by his work on integration. There he was first. The “Leb- 
esgue integral” is one of the supreme achievements of modern 
analysis. [6] 


If I have waxed enthusiastic about Baire and Cauchy and Lebesgue, 
three French geniuses of the highest order, I mustn’t suggest that the other 
mathematicians in this book are second-rate. On the contrary, everyone 
here is an all-star. In transforming the intuitive calculus into the sophisti- 
cated analysis of today, they were artists indeed. Bertrand Russell was 
right. By thinking about this mathematical adventure, we become gods. 


William Dunham 
Bryn Mawr, PA 
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The calculus,” wrote John von Neumann (1903-1957), “was the 
first achievement of modern mathematics, and it is difficult to overesti- 
mate its importance” [1]. 

Today, more than three centuries after its appearance, calculus contin- 
ues to warrant such praise. It is the bridge that carries students from the 
basics of elementary mathematics to the challenges of higher mathematics 
and, as such, provides a dazzling transition from the finite to the infinite, 
from the discrete to the continuous, from the superficial to the profound. 
So esteemed is calculus that its name is often preceded by “the,” as in von 
Neumann's observation above. This gives “the calculus” a status akin to 
“the law”—that is, a subject vast, self-contained, and awesome. 

Like any great intellectual pursuit, the calculus has a rich history and 
a rich prehistory. Archimedes of Syracuse (ca. 287-212 BcE) found certain 
areas, volumes, and surfaces with a technique we now recognize as proto- 
integration. Much later, Pierre de Fermat (1601-1665) determined slopes 
of tangents and areas under curves in a remarkably modern fashion. These 
and many other illustrious predecessors brought calculus to the threshold 
of existence. 

Nevertheless, this book is not about forerunners. It goes without say- 

ing that calculus owes much to those who came before, just as modern art 
owes much to the artists of the past. But a specialized museum—the 
Museum of Modern Art, for instance—need not devote room after room 
to premodern influences. Such an institution can, so to speak, start in the 
middle. And so, I think, can I. 
Thus I shall begin with the two seventeenth-century scholars, Isaac 
Newton (1642-1727) and Gottfried Wilhelm Leibniz (1646-1716), who 
gave birth to the calculus. The latter was first to publish his work in a 1684 
paper whose title contained the Latin word calculi (a system of calculation) 
that would attach itself to this new branch of mathematics. The first text- 
book appeared a dozen years later, and the calculus was here to stay. 

As the decades passed, others took up the challenge. Prominent 
among these pioneers were the Bernoulli brothers, Jakob (1654-1705) 
and Johann (1667-1748), and the incomparable Leonhard Euler (1707— 
1783), whose research filled many thousands of pages with mathematics 
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of the highest quality. Topics under consideration expanded to include 
limits, derivatives, integrals, infinite sequences, infinite series, and more. 
This extended body of material has come to be known under the general 
rubric of “analysis.” 

With increased sophistication came troubling questions about the 
underlying logic. Despite the power and utility of calculus, it rested upon 
a less-than-certain foundation, and mathematicians recognized the need 
to recast the subject in a precise, rigorous fashion after the model of Euclid’s 
geometry. Such needs were addressed by nineteenth-century analysts 
like Augustin-Louis Cauchy (1789-1857), Georg Friedrich Bernhard 
Riemann (1826-1866), Joseph Liouville (1809-1882), and Karl Weier- 
strass (1815-1897). These individuals worked with unprecedented care, 
taking pains to define their terms exactly and to prove results that had 
hitherto been accepted uncritically. 

But, as often happens in science, the resolution of one problem 
opened the door to others. Over the last half of the nineteenth century, 
mathematicians employed these logically rigorous tools in concocting a 
host of strange counterexamples, the understanding of which pushed 
analysis ever further toward generality and abstraction. This trend was 
evident in the set theory of Georg Cantor (1845-1918) and in the subse- 
quent achievements of scholars like Vito Volterra (1860-1940), René Baire 
(1874-1932), and Henri Lebesgue (1875-1941). 

By the early twentieth century, analysis had grown into an enormous 
collection of ideas, definitions, theorems, and examples—and had devel- 
oped a characteristic manner of thinking—that established it as a mathe- 
matical enterprise of the highest rank. 

What follows is a sampler from that collection. My goal is to examine 
the handiwork of those individuals mentioned above and to do so ina 
manner faithful to the originals yet comprehensible to a modern reader. | 
shall discuss theorems illustrating the development of calculus over its 
formative years and the genius of its most illustrious practitioners. The 
book will be, in short, a “great theorems” approach to this fascinating 
story. 

To this end I have restricted myself to the work of a few representative 
mathematicians. At the outset I make a full disclosure: my cast of characters 
was dictated by personal taste. Some whom I have included, like Newton, 
Cauchy, Weierstrass, would appear in any book with similar objectives. 
Some, like Liouville, Volterra and Baire, are more idiosyncratic. And others, 
like Gauss, Bolzano, and Abel, failed to make my cut. 
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Likewise, some of the theorems I discuss are known to any mathemat- 
ically literate reader, although their original proofs may come as a surprise 
to those not conversant with the history of mathematics. Into this category 
fall Leibnizs barely recognizable derivation of the “Leibniz series” from 
1673 and Cantor's first but less-well-known proof of the nondenumer- 
ability of the continuum from 1874. Other theorems, although part of the 
folklore of mathematics, seldom appear in modern textbooks; here I am 
thinking of a result like Weierstrass’s everywhere continuous, nowhere dif- 
ferentiable function that so astounded the mathematical world when it 
was presented to the Berlin Academy in 1872. And some of my choices, 

1 sin(In x) 


I concede, are downright quirky. Euler’s evaluation of lA ie for 
ae = 


example, is included simply as a demonstration of his analytic wizardry. 

Each result, from Newton's derivation of the sine series to the appear- 
ance of the gamma function to the Baire category theorem, stood at the 
research frontier of its day. Collectively, they document the evolution of 
analysis over time, with the attendant changes in style and substance. This 
evolution is striking, for the difference between a theorem from Lebesgue 
in 1904 and one from Leibniz in 1690 can be likened to the difference 
between modern literature and Beowulf. Nonetheless—and this is critical— 
I believe that each theorem reveals an ingenuity worthy of our attention 
and, even more, of our admiration. 

Of course, trying to characterize analysis by examining a few theorems 
is like trying to characterize a thunderstorm by collecting a few raindrops. 
The impression conveyed will be hopelessly incomplete. To undertake 
such a project, an author must adopt some fairly restrictive guidelines. 

One of mine was to resist writing a comprehensive history of analysis. 
That is far too broad a mission, and, in any case, there are many works that 
describe the development of calculus. Some of my favorites are mentioned 
explicitly in the text or appear as sources in the notes at the end of the book. 

A second decision was to exclude topics from both multivariate calcu- 
lus and complex analysis. This may be a regrettable choice, but I believe it 
is a defensible one. It has imposed some manageable boundaries upon the 
contents of the book and thereby has added coherence to the tale. Simul- 
taneously, this restriction should minimize demands upon the reader’ 
background, for a volume limited to topics from univariate, real analysis 
should be understandable to the widest possible audience. 

This raises the issue of prerequisites. The book’s objectives dictate that 
I include much technical detail, so the mathematics necessary to follow 
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these theorems is substantial. Some of the early results require consider- 
able algebraic stamina in chasing formulas across the page. Some of the 
later ones demand a refined sense of abstraction. All in all, I would not 
recommend this for the mathematically faint-hearted. 

At the same time, in an attempt to favor clarity over conciseness, I 
have adopted a more conversational style than one would find in a stan- 
dard text. I intend that the book be accessible to those who have majored 
or minored in college mathematics and who are not put off by an integral 
here or an epsilon there. My goal is to keep the prerequisites as modest as 
the topics permit, but no less so. To do otherwise, to water down the con- 
tent, would defeat my broader purpose. 

So, this is not primarily a biography of mathematicians, nor a history 
of calculus, nor a textbook. I say this despite the fact that at times I pro- 
vide biographical information, at times I discuss the history that ties one 
topic to another, and at times I introduce unfamiliar (or perhaps long for- 
gotten) ideas in a manner reminiscent of a textbook. But my foremost 
motivation is simple: to share some favorite results from the rich history of 
analysis. 

And this brings me to a final observation. 

In most disciplines there is a tradition of studying the major works of 
illustrious predecessors, the so-called “masters” of the field. Students of lit- 
erature read Shakespeare; students of music listen to Bach. In mathematics 
such a tradition is, if not entirely absent, at least fairly uncommon. This 
book is meant to address that situation. Although it is not intended as a his- 
tory of the calculus, I have come to regard it as a gallery of the calculus. 

To this end, I have assembled a number of masterpieces, although these 
are not the paintings of Rembrandt or Van Gogh but the theorems of Euler 
or Riemann. Such a gallery may be a bit unusual, but its objective is that of 
all worthy museums: to serve as a repository of excellence. 

Like any gallery, this one has gaps in its collection. Like any gallery, 
there is not space enough to display all that one might wish. These limi- 
tations notwithstanding, a visitor should come away enriched by an 
appreciation of genius. And, in the final analysis, those who stroll among 
the exhibits should experience the mathematical imagination at its most 
profound. 


CHAPTER | 


tr 


Newton 


Isaac Newton 


saac Newton (1642-1727) stands as a seminal figure not just in math- 
ematics but in all of Western intellectual history. He was born into a world 
where science had yet to establish a clear supremacy over medieval super- 
stition. By the time of his death, the Age of Reason was in full bloom. This 
remarkable transition was due in no small part to his own contributions. 
For mathematicians, Isaac Newton is revered as the creator of calculus, 
or, to use his name for it, of “fluxions.” Its origin dates to the mid-1660s 
when he was a young scholar at Trinity College, Cambridge. There he had 
absorbed the work of such predecessors as René Descartes (1596-1650), 
John Wallis (1616-1703), and Trinitys own Isaac Barrow (1630-1677), 
but he soon found himself moving into uncharted territory. During the 
next few years, a period his biographer Richard Westfall characterized as 
one of “incandescent activity,” Newton changed forever the mathematical 
landscape [1]. By 1669, Barrow himself was describing his colleague as 
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“a fellow of our College and very young . . . but of an extraordinary genius 
and proficiency” [2]. 

In this chapter, we look at a few of Newton’ early achievements: his 
generalized binomial expansion for turning certain expressions into infinite 
series, his technique for finding inverses of such series, and his quadrature 
rule for determining areas under curves. We conclude with a spectacular 
consequence of these: the series expansion for the sine of an angle. New- 
ton’s account of the binomial expansion appears in his epistola prior, a let- 
ter he sent to Leibniz in the summer of 1676 long after he had done the 
original work. The other discussions come from Newton's 1669 treatise De 
analysi per aequationes numero terminorum infinitas, usually called simply 
the De analysi. 

Although this chapter is restricted to Newton's early work, we note that 
“early” Newton tends to surpass the mature work of just about anyone else. 


GENERALIZED BINOMIAL EXPANSION 


By 1665, Isaac Newton had found a simple way to expand—his word 

was “reduce”’—binomial expressions into series. For him, such reductions 
would be a means of recasting binomials in alternate form as well as an 
entryway into the method of fluxions. This theorem was the starting point 
for much of Newton's mathematical innovation. 
As described in the epistola prior, the issue at hand was to reduce the 
binomial (P + PQ)" and to do so whether m/n “is integral or (so to speak) 
fractional, whether positive or negative” [3]. This in itself was a bold idea 
for a time when exponents were sufficiently unfamiliar that they had first 
to be explained, as Newton did by stressing that “instead of Va, Va, Va°, 
etc. I write al/?, al, a*’?, and instead of 1/a, l/aa, 1/a, 1 write a“, a-?, 
a>” [4]. Apparently readers of the day needed a gentle reminder. 

Newton discovered a pattern for expanding not only elementary bino- 


1 
mials like (1 + x)> but more sophisticated ones like —————= = (1+ x)”. 
. (1+ xy 


The reduction, as Newton explained to Leibniz, obeyed the rule 


m-n 


(P+ PQ)" =P™" +" Age BQ 
n 


+ Cot = "DQ + etc., (1) 
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where each of A, B, C,... represents the previous term, as will be illus- 
trated below. This is his famous binomial expansion, although perhaps in 
an unfamiliar guise. 


Newton provided the example of Vc? +x* a ole em cae (ou 


2 
x 


Here, P=c*, Q= —=,m=1,andn=2. Thus, 
c 


To identify A, B, C, and the rest, we recall that each is the immediately 
preceding term. Thus, A = (c*)""* =c, giving us 


2C 


x 

The analogous substitutions yield C = — ae and then D = a Working 
c c 

from left to right in this fashion, Newton arrived at 


2 4 6 5x8 
e+e Set —-=- Sh = + 
2c 8c 16c 128c 


Obviously, the technique has a recursive flavor: one finds the coeffi- 
cient of x® from the coefficient of x°, which in turn requires the coefficient 
of x*, and so on. Although the modern reader is probably accustomed to a 
“direct” statement of the binomial theorem, Newton’s recursion has an un- 
deniable appeal, for it streamlines the arithmetic when calculating a nu- 
merical coefficient from its predecessor. 

For the record, it is a simple matter to replace A, B, C,.. . by their 
equivalent expressions in terms of P and Q, then factor the common 
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Pp” from both sides of (1), and so arrive at the result found in today’s 
texts: 


m/n _ m 
(1+ Q) a ei Q 
‘ca Gas 
n\n n Be eas Q2) 
3x2xl1 Q 


Newton likened such reductions to the conversion of square roots 
into infinite decimals, and he was not shy in touting the benefits of the 
operation. “It is a convenience attending infinite series,” he wrote in 
161, 


that all kinds of complicated terms . . . may be reduced to the class 
of simple quantities, i-e., to an infinite series of fractions whose nu- 
merators and denominators are simple terms, which will thus be 
freed from those difficulties that in their original form seem’d al- 
most insuperable. [5] 


To be sure, freeing mathematics from insuperable difficulties is a worthy 
undertaking. 
One additional example may be helpful. Consider the expansion of 


1 

7 ce which Newton put to good use in a result we shall discuss later 
=% 

in the chapter. We first write this as (1 — x*)-!, identify m=- 1, n=2, 

and Q =—x?, and apply (2): 


Lo 1), 9, C1203/2) a 
ae + x)+ 1 (-x*) 


ee, oe 
3x2xl 

rn (—1/2)(—3/2)(—5/2)(-7/2) 
4x3x2x1l 


ey ee 


1 Sp) 
alt 5ttixte ls xP pee, (3) 
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Newton would “check” an expansion like (3) by squaring the series 
and examining the answer. If we do the same, restricting our attention to 
terms of degree no higher than x®, we get 


eg ee a ae: 
2 8 16 128 


plies Pe nee 
2 16 


H=1lt x? txt txt xP +--., 


where all of the coefficients miraculously turn out to be 1 (try it!). The re- 
sulting product, of course, is an infinite geometric series with common ratio 


x? which, by the well-known formula, sums to —— But if the square of the 
—x 


; 5» we conclude that that series itself must be i 
= -—x 


series in (3) is 


Voila! 

Newton regarded such calculations as compelling evidence for his gen- 
eral result. He asserted that the “common analysis performed by means of 
equations of a finite number of terms” may be extended to such infinite ex- 
pressions “albeit we mortals whose reasoning powers are confined within 
narrow limits, can neither express nor so conceive all the terms of these 
equations, as to know exactly from thence the quantities we want” [6]. 


INVERTING SERIES 


Having described a method for reducing certain binomials to infinite 
series of the form z=A+ Bx +Cx* + Dx?+---, Newton next sought a 
way of finding the series for x in terms of z. In modern terminology, he 
was seeking the inverse relationship. The resulting technique involves a 
bit of heavy algebraic lifting, but it warrants our attention for it too will 
appear later on. As Newton did, we describe the inversion procedure by 
means of a specific example. 

Beginning with the series z= x —x*+x?—x*+---, we rewrite it as 


(x—x?+x3—xt+---)-z=0 (4) 


and discard all powers of x greater than or equal to the quadratic. This, of 
course, leaves x — z= 0, and so the inverted series begins as x = z. 
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Newton was aware that discarding all those higher degree terms ren- 
dered the solution inexact. The exact answer would have the form x = z+ p, 
where p is a series yet to be determined. Substituting z+ p for x in (4) 
gives 

(zt+p)—-@t+p)*+(zt+pP—-@t+ptt+---]-—z=0, 


which we then expand and rearrange to get 


ber ee ee sl ee Dee ee oe al 
+ [-1 + 3z- 627 + 10z3 —- - -]p? + [1 -—4z+ 102? --- -]p? 
+ [-1+5z----]Jpt+---=0. (5) 


Next, jettison the quadratic, cubic, and higher degree terms in p and solve 
to get 
2 
Pee: -2+z'-2 +: 
1-22+3z7-42°+--- 


Newton now did a second round of weeding, as he tossed out all but 


the lowest power of z in numerator and denominator. Hence p is approxi- 
2 


z é ‘ ‘ ‘ 
mately 7789 the inverted series at this stage looks like x = z+ p=z+ 2. 


But p is not exactly z. Rather, we say p = z* + q, where q is a series to 
be determined. To do so, we substitute into (5) to get 


[eer gt gm | oe [ee oe ae? ee” Sele eg) 
+\-l + 3¢— 62" + 10 =) 3]? +g) + LL —4z2 4 102 — +=] 
(2Asy)? + |Loss ig egy), 


We expand and collect terms by powers of q: 


[—z3 + zt — 79 +---] + [1 -—2z+27 +2273 ----]q 
+ [-14+3z-3z2-2734+---]q@@+---. (6) 


As before, discard terms involving powers of q above the first, solve to 
an ain 
1-2z+z7 +22? 4+::- 


get q= , and then drop all but the lowest degree 


3 
‘ z : : : ; 
terms top and bottom to arrive at q = a At this point, the series looks like 


S222 Sy 25 2 ee. 
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The process would be continued by substituting q = z+ r into (6). 
Newton, who had a remarkable tolerance for algebraic monotony, seemed 
able to continue such calculations ad infinitum (almost). But eventually 
even he was ready to step back, examine the output, and seek a pattern. 
Newton put it this way: “Let it be observed here, by the bye, that when 5 
or 6 terms... are known, they may be continued at pleasure for most 
part, by observing the analogy of the progression” [7]. 

For our example, such an examination suggests that x = z+ 77+ 23+ 
zt+2°+--- is the inverse of the series z=x—x*+x3-—xt+--- with 
which we began. 

In what sense can this be trusted? After all, Newton discarded most of 
his terms most of the time, so what confidence remains that the answer is 
correct? 

Again, we take comfort in the following “check.” The original series 
Z=x—-—x*+x3-xt+--- is geometric with common ratio — x, and so in 


x z : : 
closed form z = ess Consequently, x = [op nich we recognize to be 
x 2 


the sum of the geometric series z+ 27+ 2?+2++2°+---. This is pre- 
cisely the result to which Newton's procedure had led us. Everything 
seems to be in working order. 

The techniques encountered thus far—the generalized binomial ex- 
pansion and the inversion of series—would be powerful tools in Newton's 
hands. There remains one last prerequisite, however, before we can truly 
appreciate the master at work. 


QUADRATURE RULES FROM THE DE ANALYS! 


In his De analysi of 1669, Newton promised to describe the method 
“which I had devised some considerable time ago, for measuring the quan- 
tity of curves, by means of series, infinite in the number of terms” [8]. This 
was not Newtons first account of his fluxional discoveries, for he had 
drafted an October 1666 tract along these same lines. The De analysi was a 
revision that displayed the polish of a maturing thinker. Modern scholars 
find it strange that the secretive Newton withheld this manuscript from all 
but a few lucky colleagues, and it did not appear in print until 1711, long 
after many of its results had been published by others. Nonetheless, the 
early date and illustrious authorship justify its description as “perhaps the 
most celebrated of all Newton’s mathematical writings” [9]. 
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The treatise began with a statement of the three rules for “the quadra- 
ture of simple curves.” In the seventeenth century, quadrature meant de- 
termination of area, so these are just integration rules. 


Rule 1. The quadrature of simple curves: If y = ax"”” is the curve 
AD, where a is a constant and m and n are positive integers, then 


a imtnin 


the area of region ABD is ——— (see figure 1.1). 


+n 
A modern version of this would identify A as the origin, B as (x, 0), and 


the curve as y=at'’". Newton's statement then becomes Jo aerate = 


axhn/n+1 an 
xmtnn | which is just a special case of the power rule 


(m/n) +1 “m+n 
from integral calculus. 

Only at the end of the De analysi did Newton observe, almost as an af- 
terthought, that “an attentive reader” would want to see a proof for Rule 1 
[10]. Attentive as always, we present his argument below. 

Again, let the curve be AD with AB=x and BD=y, as shown in 
figure 1.2. Newton assumed that the area ABD beneath the curve was given 
by an expression z written in terms of x. The goal was to find a corresponding 


y=ax™? 


Figure 1.1 
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Figure 1.2 


formula for y in terms of x. From a modern vantage point, he was beginning 
with z= I y(t)dt and seeking y = y(x). His derivation blended geometry, 
algebra, and fluxions before ending with a few dramatic flourishes. 

At the outset, Newton let 8 be a point on the horizontal axis a tiny dis- 
tance o from B. Thus, segment Af has length x + o. He let z be the area 
ABD, although to emphasize the functional relationship we shall take the 
liberty of writing z = z(x). Hence, z(x + 0) is the area AB6 under the curve. 
Next he introduced rectangle BBHK of height v = BK = BH, the area of 
which he stipulated to be exactly that of region BBdD beneath the curve. 
In other words, the area of BBOD was to be ov. 


an 
At this point, Newton specified that z(x) = —— xomenyin 


m+n 
ceeded to find the instantaneous rate of change of z. To do so, he exam- 


ined the change in z divided by the change in x as the latter becomes 
small. For notational ease, he temporarily let c = an/(m +n) and p=m+n 
so that z(x) = cx?’ and 


and pro- 


2601" SC? (7) 


Now, z(x + 0) is the area ABO, which can be decomposed into the area 
of ABD and that of BBOD. The latter, as noted, is the same as rectangular 
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area ov and so Newton concluded that z(x + 0) = z(x) + ov. Substituting 
into (7), he got 


[z(x) + ov]" = [z@& + 0)]" = c%(x + 0)?, 
and the binomials on the left and right were expanded to 


[z(x)]" + nfz(x)]" tov + AP eyo Sega 
= c"x? + c"pxPo +c" ape )) xP 72 4... 


Applying (7) to cancel the leftmost terms on each side and then dividing 

through by 0, Newton arrived at 

n(n — 1) 
2 

PPO ote (8) 


n[z(x)]" 1 v + [z(x)]"* ov? +++ 


= c"pxP | +c¢ 


At that point, he wrote, “If we suppose Bf to be diminished infinitely 
and to vanish, or o to be nothing, v and y in that case will be equal, and the 
terms which are multiplied by o will vanish” [11]. He was asserting that, as 0 
becomes zero, so do all terms in (8) that contain o. At the same time, v becomes 
equal to y, which is to say that the height BK of the rectangle in Figure 1.2 will 
equal the ordinate BD of the original curve. In this way, (8) transforms into 


nizoo]? yar. (9) 


A modern reader is likely to respond, “Not so fast, Isaac!” When New- 
ton divided by o, that quantity most certainly was not zero. A moment 
later, it was zero. There, in a nutshell, lay the rub. This zeroMonzero di- 
chotomy would trouble analysts for the next century and then some. We 
shall have much more to say about this later in the book. 

But Newton proceeded. In (9) he substituted for z(x), c, and p and 
solved for 


oh mt+n-1 
_ c"pxP! latal aay 


y= rr Sa 
nlz(x)] {ae aa 


= ax!" 


(m+n) 
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Thus, starting from his assumption that the area ABD is given by 
(m+n)/n 
———* 
m+n 
equation y = ax". He had, in essence, differentiated the integral. Then, 
without further justification, he stated, “Wherefore conversely, if ax” = y, 


a m+n )/n . ‘ 
it shall be as tnin — 9” His proof of rule 1 was finished [12]. 
m+n 


This was a peculiar twist of logic. Having derived the equation of y from 


that of its area accumulator z, Newton asserted that the relationship went 
an yc hntn yn 
m+n 
Such an argument tends to leave us with mixed feelings, for it features 
some gaping logical chasms. Derek Whiteside, editor of Newton's mathe- 
matical papers, aptly characterized this quadrature proof as “a brief, 
scarcely comprehensible appearance of fluxions” [13]. On the other hand, 
it is important to remember the source. Newton was writing at the very 
beginning of the long calculus journey. Within the context of his time, the 
proof was groundbreaking, and his conclusion was correct. Something 
rings true in Richard Westfall’s observation that, “however briefly, De analysi 
did indicate the full extent and power of the fluxional method” [14]. 
Whatever the modern verdict, Newton was satisfied. His other two 
rules, for which the De analysi contained no proofs, were as follows: 


2(x) = , Newton had deduced that curve AD must satisfy the 


the other way and that the area under y = ax" is indeed 


Rule 2. The quadrature of curves compounded of simple ones: If 
the value of y be made up of several such terms, the area likewise 
shall be made up of the areas which result from every one of the 
terms. [15] 


Rule 3. The quadrature of all other curves: But if the value of y, or 
any of its terms be more compounded than the foregoing, it must 
be reduced into more simple terms . . . and afterwards by the pre- 
ceding rules you will discover the [area] of the curve sought. [16] 


Newton’ second rule affirmed that the integral of the sum of finitely 
many terms is the sum of the integrals. This he illustrated with an example 
or two. The third rule asserted that, when confronted with a more compli- 
cated expression, one was first to “reduce” it into an infinite series, integrate 
each term of the series by means of the first rule, and then sum the results. 

This last was an appealing idea. More to the point, it was the final pre- 
requisite Newton would need to derive a mathematical blockbuster: the 
infinite series for the sine of an angle. This great theorem from the De 
analysi will serve as the chapter's climax. 
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NEWTON’s DERIVATION OF THE SINE SERIES 


Consider in figure 1.3 the quadrant of a circle centered at the origin 
and with radius 1, where as before AB = x and BD = y. Newton's initial ob- 
jective was to find an expression for the length of arc aD [17]. 

From D, draw DT tangent to the circle, and let BK be “the moment of 
the base AB.” In a notation that would become standard after Newton, we 
let BK=dx. This created the “indefinitely small” right triangle DGH, 
whose hypotenuse DH Newton regarded as the moment of the arc @D. We 
write DH = dz, where z = z(x) stands for the length of arc aD. Because all 
of this is occurring within the unit circle, the radian measure of Za@AD is 
zas well. 

Under this scenario, the infinitely small triangle DGH is similar to 


GH Br 
triangle DBT so that oa oe . Moreover, radius AD is perpendicular to 


tangent line DT, and so altitude BD splits right triangle ADT into similar 


BI BD 
pieces: triangles DBT and ABD. It follows that i ae and from these 


GH BD 
two proportions we conclude that —— a a With the differential notation 


dx 
above, this amounts to ad = a and hence dz = —. 
dz 1 y 


Figure 1.3 
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; z 
Newton’ next step was to exploit the circular relationship y = v1 — x 


dx dx 1 
to conclude that dz = — = ————. Expandin as in (3) led to 
y afl— x? . ? lax 
dz= ise ae te ? xo + 2) x8 +--+ Idx, 
2 8 16 128 


and so 


x x 1, 34 5 6 35 8 
2=2)= [fae= fists te ts ee 


Finding the quadratures of these individual powers and summing the re- 
sults by Rule 3, Newton concluded that the arclength of @D was 
1 
Z=xtoxe+ : x + 2 x! + = xo + 
6 40 112 1152 


(10) 


Referring again to Figure 1.3, we see that z is not only the radian mea- 
sure of Z@AD, but the measure of ZADB as well. From triangle ABD, we 
know that sin z = x and so 

Sf Oh 82 8 


1 
arcsinx =Z=x+ xo+ x + xo + x + 
6 40 112 1152 


Newton had used 


Thus, beginning with the algebraic expression =, 


his generalized binomial expansion and basic integration to derive the se- 
ries for arcsine, an intrinsically more complicated relationship. 

But Newton had one other trick up his sleeve. Instead of a series for 
arclength (z) in terms of its coordinate (x), he sought to reverse the pro- 
cess. He wrote, “If, from the Arch @D given, the Sine AB was required, | 
extract the root of the equation found above” [18]. That is, Newton would 
apply his inversion procedure to convert the series for z= arcsin x into 
one for x = sin z. 

Following the technique described earlier, we begin with x = z as the 
first term. To push the expansion to the next step, substitute x = z + p into 
(10) and solve to get 
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1 
from which we retain only p = — ae This extends the series to x = z— 


1 1 
re Next introduce p = ee +q and continue the inversion process, 


solving for 


15, 34 
La 2 ets 
2 8 


De 35 ee eee ee 
or simply q = Do” . At this stage x = ane or , and, as Newton 


might say, we “continue at pleasure” until discerning the pattern and writ- 
ing down one of the most important series in analysis: 


3 5 t 9 
+ Zz zi + Zz 


To find the Bafe from the Length of the Curve given. 


45. If from the Arch aD given the Sine AB 
was required ; I extract the Root of the Equation 
found above, viz, 2 =x 4+ 7 + 305 + 
+izx7 (it being fuppofed that AB=x, eD=2, 


and Aw —=1) by which I find x =z — {23 + 
Tree pegs2” + serrrs2? Ge. 

46. And moreover if the Cofine AG were 
required from that Arch given, make AB (= 


V1 — xx) =I— 727 + pet — sigs 4 


Newtons series for sine and cosine (1669) 


co 


—pk 
For good measure, Newton included the series for cos z = » on oe 
h=0 


the words of Derek Whiteside, “These series for the sine and cosine .. . 
here appear for the first time in a European manuscript” [19]. 
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To us, this development seems incredibly roundabout. We now regard 
the sine series as a trivial consequence of Taylor's formula and differential 
calculus. It is so natural a procedure that we expect it was always so. But 
Newton, as we have seen, approached this very differently. He applied 
rules of integration, not of differentiation; he generated the sine series 
from the (to our minds) incidental series for the arcsine; and he needed 
his complicated inversion scheme to make it all work. 

This episode reminds us that mathematics did not necessarily evolve 
in the manner of today’s textbooks. Rather, it developed by fits and starts 
and odd surprises. Actually that is half the fun, for history is most intrigu- 
ing when it is at once significant, beautiful, and unexpected. 

On the subject of the unexpected, we add a word about Whiteside’s 
qualification in the passage above. It seems that Newton was not the first 
to discover a series for the sine. In 1545, the Indian mathematician 
Nilakantha (1445-1545) described this series and credited it to his even 
more remote predecessor Madhava, who lived around 1400. An account 
of these discoveries, and of the great Indian tradition in mathematics, can 
be found in [20] and [21]. It is certain, however, that these results were 
unknown in Europe when Newton was active. 

We end with two observations. First, Newton’s De analysi is a true 
classic of mathematics, belonging on the bookshelf of anyone interested in 
how calculus came to be. It provides a glimpse of one of history’s most fer- 
tile thinkers at an early stage of his intellectual development. 

Second, as should be evident by now, a revolution had begun. The 
young Newton, with a skill and insight beyond his years, had combined 
infinite series and fluxional methods to push the frontiers of mathematics 
in new directions. It was his contemporary, James Gregory (1638-1675), 
who observed that the elementary methods of the past bore the same rela- 
tionship to these new techniques “as dawn compares to the bright light of 
noon” [22]. Gregory’s charming description was apt, as we see time and 
again in the chapters to come. And first to travel down this exciting path 
was Isaac Newton, truly “a man of extraordinary genius and proficiency.” 


CHAPTER 2 


t 


Leibniz 


Gottfried Wilhelm Leibniz 


Cateutus may be unique in having as its founders two individuals 
better known for other things. In the public mind, Isaac Newton tends to 
be regarded as a physicist, and his cocreator, Gottfried Wilhelm Leibniz 
(1646-1716), is likely to be thought of as a philosopher. This is both 
annoying and flattering—annoying in its disregard for their mathematical 
contributions and flattering in its recognition that it took more than just 
an ordinary genius to launch the calculus. 

Leibniz, with his varied interests and far-reaching contributions, had 
an intellect of phenomenal breadth. Besides philosophy and mathematics, 
he excelled in history, jurisprudence, languages, theology, logic, and 
diplomacy. When only 27, he was admitted to London's Royal Society for 
inventing a mechanical calculator that added, subtracted, multiplied, and 
divided—a machine that was by all accounts as revolutionary as it was 
complicated [1]. 
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Like Newton, Leibniz had an intense period of mathematical activity, 
although his came later than Newton’ and in a different country. Whereas 
Newton developed his fluxional ideas at Cambridge in the mid-1660s, 
Leibniz did his groundbreaking work while on a diplomatic mission to 
Paris a decade later. This gave Newton temporal priority—which he and 
his countrymen would later assert was the only kind that mattered—but it 
was Leibniz who published his calculus at a time when the De analysi and 
other Newtonian treatises were gathering dust in manuscript form. Much 
has been written about the ensuing dispute over which of the two 
deserved credit for the calculus, and the story is not a pretty one [2]. Mod- 
ern scholars, centuries removed from passions both national and personal, 
recognize that the discoveries of Newton and Leibniz were made indepen- 
dently. Like an idea whose time had come, calculus was “in the air” and 
needed only a remarkably penetrating and integrative mind to bring it 
into existence. This Newton had. 

Just as surely, so did Leibniz. Upon his arrival in Paris in 1672, he was 
a novice who admitted to lacking “the patience to read through the long 
series of proofs” necessary for mathematical success [3]. Dissatisfied with 
his modest knowledge, he spent time filling gaps, reading mathematicians 
as venerable as Euclid (ca. 300 BcE) or as up-to-date as Pascal (1623-1662), 
Barrow, and his sometime-mentor, Christiaan Huygens (1629-1695). At 
first it was hard going, but Leibniz persevered. He recalled that, in spite of 
his deficiencies, “it seemed to me, I do not know by what rash confidence in 
my own ability, that I might become the equal of these if I so desired” [4]. 

Progress was breathtaking. He wrote in one memorable passage that 
soon he was “ready to get along without help, for I read [mathematics] 
almost as one reads tales of romance” [5]. After absorbing, almost inhaling, 
the work of his contemporaries, Leibniz pushed beyond them all to create 
the calculus, thereby earning himself mathematical immortality. 

And, unlike Newton across the English Channel, Leibniz was willing 
to publish. The first printed version of the calculus was Leibniz’s 1684 paper 
bearing the long title, “Nova methodus pro maximis et minimis, itemque tan- 
gentibus, quae nec fractas, nec irrationales quantitates moratur, et singulare pro 
illis calculi genus.” This translates into “A New Method for Maxima and 
Minima, and also Tangents, which is Impeded Neither by Fractional Nor 
by Irrational Quantities, and a Remarkable Type of Calculus for This” [6]. 
With references to maxima, minima, and tangents, it should come as no 
surprise that the article was Leibniz’s introduction to differential calculus. 
He followed it two years later with a paper on integral calculus. Even at 
that early stage, Leibniz not only had organized and codified many of the 
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basic calculus rules, but he was already using dx for the differential of x 
and J x dx for its integral. Among his other talents was his ability to pro- 
vide what Laplace later called “a very happy notation” [7]. 


MENSIS OCTOBRISA.MDCLXXXIV. 467 
NOVA METHODVS PRO MAXIMIS ET MI, 
nimis , itemque tangentibus , qua nec fratias, nec irrati- 
onales quantitates moratur, & fingulare pro 

ils calculi genus. per G.G.L, 


Giaxis AX, & curve plures, uc VV, WW, YY, ZZ, quarum ordi- 
nate, ad axem normales, VX, WX, YX, ZX, que vocentur refpe- 
ctive, v, vv, y, 2; & ipfa AX abfeifla ab axe, vocetur x. Tangentes fint 
VB, WC, YD, ZE axi occurrentes refpective in pundtis B, C, D, E. 
Jam reéta aliqua pro arbitrio affumta vocetur dx, & recta qua fit ad 
dx, ut v (vel vv, vel y, velz) eftad VB (vel WC, vel YD, vel ZE) vo- 
cetur dv (vel d vv, vel dy vel dz) five differentia ipfarum uv (vel ipfae 
rum vv, aut y, autz) His pofitis calculi reguiz erunt tales oo 

Sita quantitas data conftans, erit dazqualiso, &d ax erit equ: 
adx:fifity equ. v (feu ordinata quevis curve YY , equalis cuivis or- 
dinate refpondenti curve VV )erit dy equ. dy. Jam Additio & Sub- 
trattio: fi fitz-yt wtx xqu.v, erit dz--ytvvtx feud, equ. 
dz -dytdvvt dx. Multiplicatio, dxvequ. xdvutwudx, feu pofiro 
y zqu. xp, fier dy equ. xd tpdx. Inarbitrio enim eft vel formulam, 
ut xy, vel compendio pro ea literam, uty, adhibere. Notandum &x 
& dxeodem modo in hoc calculo tra@tari,ut y & dy,vel aliam literam 
indeterminatam cum fua differentiali, _Notandum etiam n6n dari 
femper regreffuma differentiali Z.quatione, nificum quadam cautio- 


v Dv 
ne,dequoalibi. Porro Divifio, d—vel (pofitoz equ. )dzequ. 


tedytydy 


Leibniz’s first paper on differential calculus (1684) 


In this chapter, we examine a pair of theorems from the years 1673- 
1674. Much of our discussion is drawn from Leibnizs monograph Historia 
et origo calculi differentialis, an account of the events surrounding his cre- 
ation of the calculus [8]. Our first result, more abstract, is known as the 
transmutation theorem. Although its geometrical convolutions may not 
appeal to modern tastes, it reveals his mathematical gift and leads to an 
early version of what we now call integration by parts. The second result, 
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a consequence of the first, is the so-called “Leibniz Series.” Like Newton's 
work, discussed in the previous chapter, this combined series expansions 
and basic integration techniques to produce an important and fascinating 
outcome. 


THE TRANSMUTATION THEOREM 


Finding areas beneath curves was a hot topic in the middle of the sev- 
enteenth century, and this is the subject of the Leibniz transmutation theo- 
rem. Suppose, in figure 2.1, we seek the area beneath the curve AB. Leibniz 
imagined this region as being composed of infinitely many “infinitesimal” 
rectangles, each of width dx and height y, where the latter varies with the 
shape of AB. 

To us today, the nature of Leibniz’ dx is unclear. In the seventeenth 
century, it was seen as a least possible length, an infinitely small magni- 
tude that could not be further subdivided. But how is such a thing possi- 
ble? Clearly any length, no matter how razor-thin, can be split in half. 
Leibniz’s explanations in this regard were of no help, for even he became 


Figure 2.1 
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unintelligible when addressing the matter. Consider the following passage 
from sometime after 1684: 


by ... infinitely small, we understand something . . . indefinitely 
small, so that each conducts itself as a sort of class, and not merely 
as the last thing of a class. If anyone wishes to understand these 
[the infinitely small] as the ultimate things... , it can be done, 
and that too without falling back upon a controversy about the 
reality of extensions, or of infinite continuums in general, or of 
the infinitely small, ay even though he think that such things are 
utterly impossible. [9] 


The reader is forgiven for finding this clarification less than clarifying. 
Leibniz himself seemed to choose expediency over logic when he added 
that, even if the nature of these indivisibles is uncertain, they can nonethe- 
less be used as “a tool that has advantages for the purpose of the calcula- 
tion.” Again we glimpse the mathematical quagmire that would confront 
analysts of the future. But in 1673 Leibniz was eager to press on, and a 
later generation could tidy up the logic. 

Returning to figure 2.1, we see that the infinitesimal rectangle has area 
y dx. To calculate the area under the curve AB, Leibniz summed an infinitude 


Figure 2.2 
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of these areas. As a symbol for this process, he chose an elongated “S” (for 
“summa”) and thus denoted the area as J y dx. Thereafter, his integral sign 
became the “logo” of calculus, announcing to all who saw it that higher 
mathematics was afoot. 

It is one thing to have a notation for area and quite another to know 
how to compute it. Leibniz’s transmutation theorem was aimed at resolv- 
ing this latter question. 

His idea is illustrated in figure 2.2, which again shows curve AB, the area 
beneath which is our object. On the curve is an arbitrary point P with coor- 
dinates (x, y). At P, Leibniz constructed the tangent line t, meeting the verti- 
cal axis at point T with coordinates (0, z). Leibniz explained this construction 
by noting that “to find a tangent means to draw a line that connects two 
points of the curve at an infinitely small distance” [10]. Letting dx be an 
infinitesimal increment in x, he then created an infinitely small right trian- 
gle with hypotenuse PQ along the tangent line and having sides of length 
dx, dy, and ds, an enlargement of which appears in figure 2.3. We let a be 
the angle of inclination of this tangent line. 

Leibniz stressed that, “Even though this triangle is indefinite (being 
infinitely small), yet. . . it was always possible to find definite triangles 
similar to it” [11]. Of course, one may wonder how an infinitely small 
triangle can be similar to anything, but this is not the time to quibble. Leib- 
niz regarded ATDP in Figure 2.2 as being similar to the infinitesimal 


d PD - 
triangle in figure 2.3. It followed that a aus , which he solved 
to get dx TD x 


zey-x—. (1) 


Figure 2.3 
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Next, Leibniz extended his tangent line PT leftward, and from the ori- 
gin drew segment OW of length h perpendicular to this extension (again, 
see figure 2.2). Because ZPTD has measure a, we know that ZOTW has 
measure 7/2 — a, and so the measure of ZTOW is o@ as well. This makes 
AOTW similar to the infinitesimal triangle, and so we generate another 
proportion z. —., from which we conclude that 


h d& 
h ds = z dx. (2) 


Leibniz then drew AOPQ radiating from the origin and having as base 
the hypotenuse PQ of the infinitesimal triangle. In order not to clutter 
figure 2.2 any further, we redraw the diagram, with this particular trian- 
gle, in figure 2.4. 

By now, the reader may suspect that Leibniz was adrift, lost in a sea of 
pointless triangles. But in fact the oblique, infinitesimal triangle OPQ was 


central to his transmutation theorem. Because its base is of length PQ = ds 
and i ica is OW = h, we see that its area is sh ds, which, by (2) above, 
is just 5z dx. 

Leibniz assembled an infinitude of these infinitesimal triangles, all 
radiating from the origin and terminating along AB, as shown in figure 2.5. 


Se ee Ts 


Figure 2.4 
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Writing years later, Leibniz remembered that he “happened to have occa- 
sion to break up an area into triangles formed by a number of straight lines 
meeting in a point, and . . . perceived that something new could be readily 
obtained from it” [12]. 

This polar perspective was critical, for Leibniz recognized that the 
area of the wedge in figure 2.5 was the sum of the areas of infinitesimal tri- 
angles whose analytic expression he had determined above. That is, 


1 1 
Area (wedge) = Sum of triangular areas = Is zdx = 5 eae (3) 
In truth, Leibniz was not primarily interested in the area of this wedge. 


Rather, he sought the area under curve AB in figure 2.1, that is, J y dx. 
Fortunately it takes only a bit of tinkering to relate the areas in question, 
for the geometry of figure 2.6 shows that 


Area under curve AB = Area (wedge) + Area (AObB) — Area (AQaA). 
This relationship, by (3), has the symbolic equivalent 
1 1 1 
J yax => Jed + Sb yb) - Saya. (4) 


Here at last is the transmutation theorem. The name indicates that the 
original integral J y dx has been transformed (or “transmuted”) into a sum 


Figure 2.5 
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Figure 2.6 


1 1 1 
of the new integral 5 eae and the constant 5 Ph) = so Today 


we might find it more palatable to insert limits of integration (a notational 
device Leibniz did not employ) and recast the theorem as 


fiyar=>[Pcar+ 3] ol} (5) 


Formula (5) is notable for at least two reasons. 

First, it is possible that the “new” integral in z may be easier to evalu- 
ate than the original one in y. If so, z would play an auxiliary role in find- 
ing the original area. For seventeenth century mathematicians, a curve 
playing such a role was called a quadratrix, that is, a facilitator of quadra- 
ture. If it produced a simpler integral, then this whole, long process would 
pay off. As we shall see in a moment, this is exactly what happened in the 
derivation of the Leibniz series. 

The relationship in (5) has a theoretical significance as well. Recall 
that z= z(x) was the y-intercept of the line tangent to the curve AB at the 
point (x, y). The value of z thus depends on the slope of the tangent line 
and so injects the derivative into this mix of integrals. One senses that an 
important connection is lurking in the wings. 
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d 
To see it, we recall from (1) that z = y—- =n and so z dx = y dx — x dy. 


Theuacummete 0 .we lane 
Jyde= 5 fede + tby(b)- Saya) 
= 5 Jlvae-xay]+ Sy) - Saya) 
= 5 J ydx- 5 fxdy + Shy) - Saya), 


which we solve to conclude that fy dx = b y(b) — a y(a) — [x dy. 
Again, limits of integration can be inserted to give 


[ova = x = yf _ PO xdy (6) 


yla) 
b 
The geometric validity of (6) is evident in on ae for [. y dx is the 


area of the region with vertical strips, whereas ee x dy is the area of that 


with horizontal strips. Their sum is clearly the aneeice’ in area between 
the outer rectangle and the small one in the lower left-hand corner. That is, 


b (b) 
[PP yax+ ‘ x dy = by(b)—a y(a), 


which can be rearranged into (6). 

There is something else about (6) that bears comment: it looks famil- 
iar. So it should, because it follows easily from the well-known scheme for 
integration by parts 


[ feog’oax =f) g(x)|. - co f'@)dx, 


if we specify g(x) =x and f(x) =y. In that case g’(x) = 1 and f’(x)dx = dy, 
and a substitution converts the integration-by-parts formula into the 
transmutation theorem. After all of Leibniz’s convoluted reasoning with its 
infinitesimals and tangent lines, its similar triangles and wedge-shaped 
areas—in short, after a most circuitous mathematical journey—we arrive at 
an instance of integration by parts, a calculus superstar making an early and 
unexpected entrance onto the stage. 

This was intriguing, but Leibniz was not finished. By applying his trans- 
mutation theorem to a well-known curve, he discovered the infinite series 
that still carries his name. 
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Y =Y(x) 


Figure 2.7 


THE LEIBNIZ SERIES 


Leibniz began with a circular arc. Specifically, he considered a circle of 
radius 1 and center at (1, 0) and let the curve AB from his general trans- 
mutation theorem be the quadrant of this circle shown in figure 2.8. As 
will become evident momentarily, it was an inspired choice. 

The circle’s equation is (x — 1)? + y?=1 or, alternately, x* + y? = 2x. 
From the geometry of the situation, it is clear that the area beneath the 
quadrant is 2/4, and so by (1) and (5) we have 

mx fl l1oaodlp dy 
z [vax 5h 5 Ie where z= y i 
Using his newly created calculus, Leibniz differentiated the circle’s equation 


dy l- 
to get 2x dx + 2y dy = 2 dx, and so - = ——* This led to the simplification 
» 


dy =] yotx?-x Ix-x x 

dx y y Ye 
Leibniz’s objective was to find an expression for x in terms of the 

quadratrix z, and so he squared the previous result and again used the 


equation of the circle to get 
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Figure 2.8 


2 2: 2 


2 
Y= = eee , which he solved for x = = (7) 
yo ax-x* 2-x l+2 


The challenge was to evaluate [za the shaded area in figure 2.9. A 


; x : cae 
look at the graph of the quadratrix z = || and an observation similar 
2-—x 
to the one above shows that 


[izax = Area (shaded region) 
= Area (square) — Area (upper region) = 1 — fi xdz. (8) 


Returning to the transmutation theorem, Leibniz combined (7) and 
(8) as follows: 


xn 1 a df Ty. dl 1 
7 = Sool +5 fede= 54 5[1- [xar| 


4 
Lp oe 1 Z 
=] shige) J dz. 
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Figure 2.9 


He rewrote this last integrand as 


2 
z 1 
= 2 sae lz tzt-2°+--] 
1+2z 1+2z 


2 4 6 8 
SS 2° P a SZ Peek, 


where a geometric series has appeared within the brackets. From this, 
Leibniz concluded that 


a Ga a en: ee 
Fame [lz zit+z -2z2°+---|dz 
1 
3 B) 7 9 
Vn ci ns < 
ae a (ee eee oy oe ee or simpl 
3 5 7 9 Pe 
0 
a oe oe oo (9) 
4 3° 5a 7) 8 , 


This is the Leibniz series. 
What a wonderful series it is. The terms follow an absolutely trivial 
pattern: the reciprocals of the odd integers with alternating signs. Yet this 
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innocuous-looking expression sums to, of all things, %- Leibniz recalled 


that when he first communicated the result to Huygens, he received rave 
reviews, for “the latter praised it very highly, and when he returned the 
dissertation said, in the letter that accompanied it, that it would be a dis- 
covery always to be remembered among mathematicians” [13]. 

The significance of this discovery, according to Leibniz, was that “it 
was now proved for the first time that the area of a circle was exactly equal 
to a series of rational quantities” [14]. One may quibble with his use of 
“exactly,” but it is hard to argue with his enthusiasm. 

He added a curious postscript. By dividing each side of (9) in half and 
grouping the terms, Leibniz saw that 


x (1 1 it a ie | 1 41 
= + + + + 
8 \2 6 10 14 18 22 26 30 
it & <i l 
= a + 
3 35 Oo (105 
: + : + : + : + 
Pal G1 iat i=] 


In words, this says that if we diminish by 1 the square of every other even 
1 
number starting with 2 and then add the reciprocals, the sum is —. How 


strange. One is reminded that formulas from analysis can border on the 
magical. 

The Leibniz series, remarkable as it is, has no value as a numerical 
approximator of z. The series converges, but it does so with excruciating 
slowness. One could add the first 300 terms of the Leibniz series and still 
have z accurate to only a single decimal place. Such dreadful precision 
would not be worth the effort. However, as we shall see, a related infinite 
series would, in the hands of Euler, produce a highly efficient scheme for 
approximating 7. 

Unquestionably, the Leibniz series is a calculus masterpiece. As is cus- 
tomary when discussing these early results, however, we must offer a few 
words of caution. For one thing, the transmutation theorem used infinitesi- 
mal reasoning. For another, evaluating his series required Leibniz to replace 
the integral of an infinite sum by the sum of infinitely many integrals, a pro- 
cedure whose subtleties would be addressed in the centuries to come. 

And there was one other problem: Leibniz was not the first to discover 
this series. The British mathematician James Gregory had found something 
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very similar a few years before. Gregory had, in fact, come upon an expan- 
sion for arctangent, namely, 


a ae 


arctan x = x —-——+——-—F+>:-, 
Can? on! 

which, for x = 1, is the Leibniz series (although Gregory may never have 
actually made the substitution to convert this to a series of numbers). 

Leibniz, a mathematical novice in 1674, was unaware of Gregory's 
work and believed he had hit upon something new. This in turn led his 
British counterparts to regard him with some suspicion. To them, Leibniz 
had a tendency to claim credit for the achievements of others. These suspi- 
cions, of course, would be magnified early in the eighteenth century when 
the British, under the direction of Newton himself, accused Leibniz of out- 
right plagiarism in stealing the calculus. The confusion over the series 


ion : ra Lt 4 | |. was seen as an early instance of Leibniz’s 
4 > S 7 9 
perfidy. 


But even Gregory was not the first down this path. The Indian mathe- 
matician Nilakantha, whom we met in the previous chapter, described this 
series—in verse, no less—in a work called the Tantrasangraha [15]. 
Although it was unknown in Europe during Leibniz’s day, this achievement 
serves as a reminder that mathematics is a universal human enterprise. 

The work of Gregory and Nilakantha nothwithstanding, we know that 
Leibniz’s derivation of this series was not theft. He later wrote that in 1674 
neither he nor Huygens “nor yet anyone else in Paris had heard anything 
at all by report concerning the expression of the area of a circle by means 
of an infinite series of rationals” [16]. The Leibniz series, like the calculus 
generally, was a personal triumph. 

Over the next two decades, the novice would become the master as 
Leibniz refined, codified, and published his ideas on differential and inte- 
gral calculus. From such beginnings, the subject would grow—indeed, 
would explode—in the century to come. We continue this story with a 
look at his two most distinguished followers, the Bernoulli brothers of 
Switzerland. 


CHAPTER 3 


t 


The Bernoullis 


Jakob Bernoulli Johann Bernoulli 


A scientific revolution often needs more than a founding genius. It 
may require as well an organizational genius to identify the key ideas, trim 
off their rough edges, and make them comprehensible to a wider audi- 
ence. A brilliant architect, after all, may have a vision, but it takes a con- 
struction team to turn that vision into a building. 

If Newton and Leibniz were the architects of the calculus, it was the 
Bernoulli brothers, Jakob (1654-1705) and Johann (1667-1748), who 
did much to build it into the subject we know today. The brothers read 
Leibniz’s original papers from 1684 and 1686 and found them as exhila- 
rating as they were challenging. They grappled with the dense exposition, 
fleshed out its details, and then, in correspondence with Leibniz and with 
one another, provided coherence, structure, and terminology. It was Jakob, 
for instance, who gave us the word “integral” [1]. In their hands, the calcu- 
lus assumed a form easily recognizable to a student of today, with its basic 
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rules of derivatives, techniques of integration, and solutions of elementary 
differential equations. 

Although excellent mathematicians, the Bernoulli brothers exhibited a 
personal behavior best described as “unbecoming.” Johann, in particular, 
assumed the combative role of Leibniz’s bulldog in the calculus wars with 
Newton, remaining loyal to his hero, whom he called the “celebrated Leib- 
niz,” and going so far as to suggest that not only did Newton fail to invent 
calculus but he never completely understood it [2]! This was certainly a 
brazen attack on one of history’ greatest mathematicians. 

Unfortunately for family harmony, Jakob and Johann were only too 
happy to do battle with one another. Older brother Jakob, for instance, 
would refer to Johann as “my pupil,” even when the pupil’s talents were 
clearly equal to his own. And, decades after the fact, Johann gleefully 
recalled solving in a single night a problem that had stumped Jakob for the 
better part of a year [3]. 

Their difficult natures notwithstanding, the Bernoullis left deep foot- 
prints. Besides his contributions to calculus, Jakob wrote the Ars conjectan- 
di, posthumously published in 1713. This work is a classic of probability 
theory that features a proof of the law of large numbers, a fundamental 
result that it is sometimes called “Bernoullis theorem” in his honor [4]. 
For his part, Johann was the ghostwriter of the world’s first calculus text. 
This came to pass because of an agreement to supply calculus lessons, for 
a fee, to a French nobleman, the Marquis de lHospital (1661-1704). 
LHospital, in turn, assembled and published these in 1696 under the 
title Analyse des infiniment petits pour V’intelligence des lignes courbes (Analy- 
sis of the Infinitely Small for the Understanding of Curved Lines). In this 
work first appeared “l’Hospital’s rule,” a fixture of differential calculus ever 
since, although it, like so much of the book, was actually Johann Bernoul- 
lis [5]. In the preface, Hospital acknowledged his debt to Bernoulli and 
Leibniz when he wrote, “I have made free use of their discoveries so that I 
frankly return to them whatever they please to claim as their own” [6]. 

The irascible Johann, who indeed claimed the rule, was not satisfied 
with this gesture and in later years grumbled that l’'Hospital had cashed in 
on the talents of others. Of course it was Bernoulli who (literally) did the 
cashing in, as math historian Dirk Struik reminded us with this succinct rec- 
ommendation: “Let the good Marquis keep his elegant rule; he paid for it” 
[7]. To avoid losing glory a second time, Johann wrote an extensive treatise 
on integral calculus that was published, under his own name, in 1742 [8]. 

To get a clearer sense of their mathematical achievements, we shall con- 
sider selected works from each brother. We begin with Jakob’s divergence 
proof of the harmonic series, then examine his treatment of some curious 
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convergent series, and conclude with Johann’s contributions to what he 
called the “exponential calculus.” 


JAKOB AND THE HARMONIC SERIES 


Like Newton and Leibniz before him—and so many afterward—Jakob 
Bernoulli regarded infinite series as a natural pathway into analysis. This 
was evident in his 1689 work, Tractatus de seriebus infinitis earumque summa 
finita (Treatise on Infinite Series and Their Finite Sums), a state-of-the-art 
discussion of infinite series as they were understood near the end of the 
seventeenth century [9]. Jakob considered such familiar series as the geo- 
metric, binomial, arctangent, and logarithmic, as well as some previously 
unexamined ones. In this chapter, we look at two excerpts from the Tracta- 
tus, the first of which addressed the strange behavior of the harmonic 
series. 


1 11 
Long before 1689, others had recognized that 1+ a + . + ri +: 


diverges to infinity. Nicole Oresme (ca. 1323-1382) devised the proof 
found in most modern texts, and Pietro Mengoli (1625-1686) came up 
with an alternate demonstration in 1650 [10]. Leibniz, perhaps unaware of 
these predecessors, discovered divergence during his early Paris years and 


1 1 41 1 
informed his British contacts that, in his words, 1 + 5 + . + a frees - 


only to learn from them that he had been scooped once again [11]. 

So, the divergence of the harmonic series was hardly news. But we 
may gain insight, not to mention the charm of variety, by following alter- 
nate routes to the same end. Jakob Bernoulli's divergence proof, quite dif- 
ferent from those of his predecessors, is such an alternative. 

He began by comparing two types of progressions that held center 
stage in his day: the geometric and the arithmetic. The former he 
described as A, B, C, D,... , where B/A = C/B= D/C, etc., for example, 2, 
1, 1/2, 1/4, .... The latter, he wrote, had the form A, B, C, D,.. . , where 
B-A=C-B=D-C, etc.; an example is 2, 5, 8, 11,.... The modern 
convention, of course, is to emphasize the common ratio (r) in geometric 
progressions and the common difference (d) in arithmetic ones, so that we 
denote a geometric progression by A, Ar, Ar’, Ar’... and an arithmetic 
one by A, A+d,A+2d,A+3d.... 

As the fourth proposition of his Tractatus, Jakob proved a lemma 
about geometric and arithmetic progressions of positive numbers that 
begin with the same first two terms. 
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Theorem: If A, B, C,..., D, E is a geometric progression of positive 
numbers with common ratio r>1, and if A, B, F,..., G, H is an 
arithmetic progression of positive numbers also beginning with A and 
B, then the remaining entries of the geometric progession are greater, 
term by term, than their arithmetic counterparts. 


Proof: Using modern notation, we denote the geometric progression as A, 
Ar, Ar?, Ar? .. . and the arithmetic one asA, A+ d,A+2d,A+3d,.... 
By hypothesis, Ar=B=A+d. Because r>1, we have A(r—1)*>0, 
from which it follows that 


Ar2 + A> 2Ar, 
or simply C+ A>2B=2(A+d)=A+(At+2d)=A+F. 


Thus C > F; that is, the third term of the geometric series exceeds the 
third term of the arithmetic one, as claimed. This can be repeated to 
the fourth, fifth, and indeed to any term down the line. Q.E.D. 


A few propositions later, Jakob proved the following result, stated in 
characteristic seventeenth century fashion. 


Theorem: In any finite geometric progression A, B, C,..., D, E, the first 
term is to the second as the sum of all terms except the last is to the 
sum of all except the first. 


Proof: Once we master the unfamiliar language, this is easily verified because 


A_ A _ A@tr4r°te--+r™') — A+ Art Ar? +---+ Ar" 
Bo Ar Ar(ltrt¢r2te--tr™) 0 Art Ar? +--+ Ar” + Ar” 
A+B+C+---+D 
~SaC ae ee 


QED. 


Next, Jakob determined the sum of a finite geometric progression. Let- 
ting S=A+B+C+---+D+E be the sum in question, he applied the 


A —E 
previous result to get — = : 7 and then solved for 


_ A’ -BE 


S : 
A-B 


(1) 
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Note that (1) employs the first term (A), the second term (B), and the last 
term (E) of the finite geometric series, unlike the standard summation for- 
mula of today: 


_ AQ = retly 
7 l-r 


A+ Ar+ Ar? +++-+ Ar" 


o) 


which employs the first term, the number of terms, and the common ratio. 

With these preliminaries aside, we are now ready for Jakob’ analysis 
of the harmonic series. It appeared in the Tractatus immediately after a 
divergence proof credited to Johann [12]. Including his younger brother's 
work may seem unexpectedly generous, but Jakob rose to the challenge 
and gave his own alternative. In his words, the goal was to prove that “the 


i 
sum of the infinite harmonic series 1+ = + . + ri +--+ surpasses any 


given number. Therefore it is infinite” [13]. 
Theorem: The harmonic series diverges. 


Proof: Choosing an arbitrary whole number N, Jakob sought to remove 
from the beginning of the harmonic series finitely many consecutive 
terms whose sum is equal to or greater than 1. From what remained, 
he extracted a finite string of consecutive terms whose sum equals or 
exceeds another unity. He continued in this fashion until N such 
strings had been removed, making the sum of the entire harmonic 
series as least as big as N. Because N was arbitrary, the harmonic series 
is infinite. 

This procedure, taken almost verbatim from Jakob’s original, is 
fine provided we can always remove a finite string of terms whose sum 
is 1 or more. To complete the argument, Bernoulli had to demonstrate 
that this is indeed the case. He thus assumed the opposite, stating, “If, 
after having removed a number of terms, you deny that it is possible 
for the rest to surpass unity, then let 1/a be the first remaining term 
after the last removal.” In other words, for the sake of contradiction, 


1 1 1 
he supposed that the sum — + + +---remains below 1 no 


a atl at+2 
matter how far we carry it. But these denominators d,a+1,a+2,... 
form an arithmetic progression, so Jakob introduced the geometric pro- 
gression beginning with the same first two terms. That is, he considered 
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the geometric progression a, a+ 1, C, D,..., K, where he insisted 
that we continue until K = a*. This is possible because the terms of the 
atl 
progression have a common ratio r = —— > 1 and thus grow arbi- 
a 
trarily large. 


As we saw above, Jakob knew that the terms of the geometric pro- 
gression exceed those of their arithmetic counterpart, and so, upon 
taking reciprocals, he concluded that 


1 1 1 1 1 1 ol 1 
a atl a+t+2 a atl C D K 


where the expression on the left has the same (finite) number of terms 
as that on the right. He then summed the geometric series using (1) 
with A = I/a, B= 1/(a+ 1), and E= 1/K¥ 1/a* to get 


1 1 | 1 1 
l i ni 2 2 2 
m 2 teed at+l|K > a (atVa = 
a atl a+2 1 1 1 1 


a atl a atl 


a contradiction of his initial assumption. In this way Jakob established 
that, starting at any point of the harmonic series, a finite portion of 
what remained must sum to one or mote. 

To complete the proof, he used this scheme to break up the har- 
monic series as 


1 1 1 oa | al 
1+)/—+—-4+—]+]—=+—4+---+— 
E 3 *) E 6 =) 


ees ee ye 28 ae, 
26 676 677 458329 


where each parenthetical expression exceeds 1. The resulting sum can 
therefore be made greater than any preassigned number, and so the 
harmonic series diverges. QED, 


This was a clever argument. Its significance was not lost on Jakob, who 
emphasized that, “The sum of an infinite series whose final term vanishes is 
perhaps finite, perhaps infinite” [14]. Of course, no modern mathematician 
refers to the “final term” of an infinite series, but Jakob’s intent is clear: even 
though the general term of an infinite series shrinks away to zero, this is 
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not sufficient to guarantee convergence. The harmonic series stands as the 
great example to illustrate this point. So it was for Jakob Bernoulli, and so 
it remains today. 


JAKOB AND His FIGURATE SERIES 


The harmonic series was of interest because of its bad, that is, diver- 
gent, behavior. Of equal interest were well-behaved infinite series having 
finite sums. Starting with the geometric series and cleverly modifying the 
outcome, Jakob proceeded until he could calculate the exact values of 
some nontrivial series. We consider a few of these below. 

First he needed the sum of an infinite geometric progression. As noted 
in (1), Bernoulli summed a finite geometric series with the formula 


A* —BE 
A-B 


A+B+C+---+D+E= 


As a corollary he observed that, for an infinite geometric progression of 
positive terms whose common ratio is less than 1, the general term must 
approach zero. So he simply let his “last” term E = 0 to arrive at 

x2 

A-B 

Arithmetic and geometric progressions were not the only patterns 
familiar to mathematicians of the seventeenth century. So too were the “fig- 
urate numbers,” families of integers related to such geometrical entities as 
triangles, pyramids, and cubes. As an example we have the triangular num- 
bers 1, 3, 6, 10, 15,..., so named because they count the points in the 
ever-expanding triangles shown in figure 3.1. It is easy to see that the kth 


A+B+C+::-+D+--- (2) 


RkR+1 
triangular number is 1+2+---+k= a = \ 7 where the 


binomial coefficient is a notation postdating Jakob Bernoulli. 

Likewise, the pyramidal numbers are 1, 4, 10, 20, 35, . . . , which count 
the number of cannonballs in pyramidal stacks with triangular bases. It can 
kR(k+1(R+2) _[(k+2 

6 he Bo 

Of course, the square numbers 1, 4, 9, 16, 25, . . . and the cubic numbers 
1, 8, 27, 64, 125, .. . have geometric significance as well. 

Bernoullis interest in such matters took the following form: he wanted 


a b ¢ 
to find the exact sum of an infinite series yao ieee vee, 


be shown that the kth pyramidal number is 
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1 3 6 10 15 
Figure 3.1 
where the numerators a, b, c,..., d,... were figurate numbers and the 
denominators A, B, C,...,D,... constituted a geometric progression. For 
k+2 
- 3 
instance, he wished to evaluate such series as yu or ye k . These 
a 2 = 


were challenging questions at the time. 

Jakob attacked the problem by building from the simple to the 
complicated—always a good mathematical strategy. Following his argu- 
ments, we begin with an infinite series having the natural numbers as 
numerators and a geometric progression as denominators [15]. 


2 
Theorem N: ifd>1,then 4424 3 oe 4 + 2 dopiticy Oe oe 
b bd bd? bd? bd? b(d - 1) 


1 2 3 4 > 
Proof: Jakob let N = 5 + vd + ie + ce + at +--+ and decomposed 
it into a sequence of infinite geometric series, each of which he summed 


by (2): 


1 1 l l l (1/b)* d 
+—+ + +—t---= = , 
b bd hbd* bd? bd Vb-l/bd = b(d-1) 
Agi Bg gg ed 
bd bd? bd? bd’ Wbd—-1Wbd? =b(d-1)’ 
1 de _  Qlbd’y 
be be bd? SC/bd2 —1/b Bd 1)” 
2 ae ee (/bd*y* l 
bd? bd’ Wbd? —1/bd* = bd? (d-1)’ 
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Upon adding down the columns, he found 


1 2 3 4 5 
=—+—+4+ + + + 
b bd bd? bd? bd* 
d 1 al 1 
= + + + +:-- 
b(d-1) b(d-1) bd(d-) bd*(d-1) 


d }1l 1 l l d Vb? 
- Sars + te) 
d-l|b bd bd bd? d—1| Wb- Wd 
da 
“paai 


because the infinite series in brackets is again geometric. O.E.D: 


2. 3 4 5 
For instance, with b = 1 and d=7, we have 1+ —+ —+ + + 
7 49 7 49 343 2401 


1x6? 36 
Next, Jakob put triangular numbers in the numerators. 


Theorem T: If d> 1, then Pa a ° 4. ae 15 4. 
2p b bd bd* bd? bd? 


b(d- 1)? 


Proof: The trick is to break T into a string of geometric series and exploit 
the fact that the kth triangular number is 1+2+3+---+k: 


ee ee ee ae (l/by dd 
b bd bd? bd? bd* ~ Wb-lbd b(d-1)’ 
a a ae _ (Qs/bdy 
pe age ee a _ j 
bd bd? bd’ bd 2/bd —2/bd> = b(d - 1) 
3 3 3 (3/bd*)* 3 
pT eae = 2 3 , 
bd? bd? bd 3/bd? —3/bd? ~— bd(d — 1) 
Pg hein ct, OE 


+t = = , 
bd? bd* 4/bd? —4/bd* bd? (d-1) 
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Adding down the columns gives 


1 142 14243 1424+3+4 
+ + + + 
b bd bd? bd? 
d 2 3 4 
= ss f a 
b(d-1) b(d-1) bd(d-1)  bd*(d-D 


In other words, 


4 
-A|i+2 bd ae er 


d’ d° 
7 na 7 a b(d-1? b(d-1' 
by theorem N. Q.E.D. 
1 1 1 
For example, with b= 2 and d=4, we have + e + + 2 + 
8 32 128 512 
so 
~ 27 


Jakob then considered pyramidal numbers in the numerators. 
4 10 20. 35 


ly 
eorem d>1, then a ge ae 
d* 


b(d—1)* 


Proof: This follows easily because 


bt. 3. 6. W.. 16 
=|-+—+ + + +:-- 
E bd bd? bd? bd* | 


1 4 10 20. 35 | 1 
+|— 4+— +——4+——+-—+.--- |=T+—P. 
bd bd? bd? bd? bd? d 
3 4 
Hence oe poTa" —adeop = Q.E.D. 
d b(d—-1) bd = 1)" 


As an example, with b= 5 and d=5, we have 


kR+2 
—\ 3 1 4 #10 20 35 125 
a 5 25 125 625 3125 256 
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Jakob finished this part of the Tractatus by considering infinite series 
with the cubic numbers in the numerators and a geometric progression in 


the denominators. 


1 8 27 64 
Theorem C: If d>1, then C= 5 + od + = + 2B 


b(d—1)4 


Proof: 


Ls 32, 3 4 5 
C=|-+—+ + + +e: 
i; bd bd? bd? ba? 
6 24 60 120 
| Laer Laer aaa an | 
bd bd* bd*? bd 
6/1 4 10 20 35 
+—+ + + 
‘|; bd bd* bd? bd? 


=N+ 


a | d* joe 


C= = 
b(d — 1)? r d| b(d —1)* b(d —1)* 


When Jakob let b = 2 and d= 2, he concluded that 


ke 1 8 27 64 #125 = 216 
Yaa stet +a t+ 
2 2 4 8 16 32 = 64 
343 512 729 1000 
+—+—+ 
128 256 512 1024 


k=l 


exactly, surely a strange and nonintuitive result. 


125 “ 
bd* 


tos Jan $p,and s 


QED. 


+---= 26 


After such successes, Jakob Bernoulli may have begun to feel invin- 
cible. If he entertained such a notion, he soon had second thoughts, 


co 


; 1 ; ; 
for the series of reciprocals of square numbers, that is, > =e resisted all his 


k=l 


efforts. He could show, using what we now recognize as the comparison 
test, that the series converges to some number less than 2, but he was 
unable to identify it. Swallowing his pride, Jakob included this plea in his 
Tractatus: “If anyone finds and communicates to us that which has thus far 


eluded our efforts great will be our gratitude” [16]. 
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As we shall see, Bernoulli challenge went unmet for a generation 
until finally yielding to one of the greatest analysts of all time. 

Jakob Bernoulli was a master of infinite series. His brother Johann, 
equally gifted, had his own research interests. Among these was what he 
called the “exponential calculus,” which will be our next stop. 


JOHANN AND x* 


In a 1697 paper, Johann Bernoulli began with the following general 
rule: “The differential of a logarithm, no matter how composed, is equal 
to the differential of the expression divided by the expression” [17]. For 


instance, d[ln(x)] = ad or 
x 


i 1| 2xdx + 2ydy 
d[in (x + yy)] = —dlIn(xx + yy)] = =] Se 
[In ./(xx + yy)] 5 [In(xx + yy)] | a 


_ xdx + ydy 
xx + yy | 


We have retained Bernoullis original notation for this last expression. At that 
time in mathematical publishing, higher powers were typeset as they are 
today, but the quadratic x* was often written xx. Also, in the interest of full 
disclosure, we observe that Bernoulli denoted the natural logarithm of x by Lx. 


Johann wrote the corresponding integration formula as | — = lx. 
x 


Early in his career he had been seriously confused on this point, believing 
that j= - fxtax = =x" = ; x ] = ©, an overly enthusiastic application 
of the power rule and one that has yet to be eradicated from the repertoire 
of beginning calculus students [18]. Fortunately, Johann corrected his error. 

With these preliminaries behind him, Johann promised to apply prin- 
ciples “first invented by me” to reap a rich harvest of knowledge “incre- 
menting this new infinitestimal calculus with results not previously found 
or not widely known” [19]. Perhaps his most interesting example was the 
curve y = x*, shown in figure 3.2. 

For an arbitrary point F on the curve, Johann sought the subtangent, 
that is, the length of segment LE on the x-axis beneath the tangent line. To 
do this, he first took logs of both sides: In(y) = In(x*) =x In(x). He then 
used his rule to find the differentials: 
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Figure 3.2 


5 — ee = (14+Inx)dx. 
y x 


d 
But a = slope of tangent line = a = y1+Inx), and he solved for the 


y _ 1 
yd+Inx) 1+Inx 


length of the subtangent LE = 


Bernoulli next sought the minimum value—what he called the “least 
of all ordinates’—for the curve. This occurs when the tangent line is hor- 
izontal or, equivalently, when the subtangent is infinite. Johann described 
a somewhat complicated geometric procedure for identifying the value of 
x for which 1 + Inx =0 [20]. 

His reasoning was fine, but the form of his answer seems, to modern 
tastes, less than optimal. Johann was hampered because the introduction 
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of the exponential function still lay decades in the future, so he lacked a 
notation to express the result simply. We now can solve for x = I/e and 
conclude that the minimum value of x*, that is, the length of segment CM 


Ve 
in figure 3.2, is 5 = 1 sa number roughly equal to 0.6922. This 
e Sle’ 
answer, it goes without saying, is by no means obvious. 
Johann was just warming up. In another paper from 1697, he tackled 
a tougher problem: finding the area under his curve y = x* from x = 0 to 


x = 1. That is, he wanted the value of es dx. Remarkably enough, he 
found what he was seeking [21]. 
The argument required two preliminaries. The first he expressed as 
follows: 
2 3 4 


z z Z. 
Ifz=InN, then N=1+z+—+ + 
Sete Be 9a” Fea A 


Here we recognize the expression for N as the exponential series. If N = x*, 
then z= In N=xIn x, and Johann deduced that 


x°(Inx) x(Inx)?  x*(nx)* -_ 


2 2x3 2x3x4 


(3) 


x* =1+xInx+ 


His objective was to integrate this sum by summing the individual 
integrals, and for this he needed formulas for Jxtan x)"dx. He proceeded 


recursively to generate the table shown on this page. 


fax Pan 655 


I 
Sxlxdx==lLxxlxy——xx, 
2 


2 
S07 bee dsc a xe? ras xilxd <7 _ 5! 


3.2 


Sle dx ==? x4lx? — 564] x3 = 3 tly 2 x, 
: 4 math 4 


fo ame Tact le —+ le i eee le -— x be 
“- se x, 
fleas — &c. 


Johann Bernoulli’s integral table (1697) 
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A modern approach would apply integration by parts to prove the 
reduction formula 


m n _ 1 m+l n n m n-l 
Ix (In x)" dx = —— x""(n.x)" ~ pag (nx)"dx. (4) 


For m=n= 1, the recursion in (4) gives 

il 1 al 1 
Jxmxdx =— 7 Inx-—[ xdx = —x*Inx —-—x’. 
2, 2 2 4 


(Like Bernoulli and other mathematicians of his day, we have ignored 
“+ C” at the end of the integration formula.) For m =n = 2, we have 


lee (In x)” dk = Ex 3(In x)? — $f dnx ax 


1 1 
= Sth x) x? Inx J x2dx 
3 3/3 3 


2 
= Sane _ car Ing", 
g 9 27 
where we have also applied (4) with m= 2 andn=1. 
In this fashion, we replicate Bernoulli’ list of integrals. Along with the 
exponential series in (3), this was the key to solving his curious problem. 


pet 
Theorem: lk x*dx = I-54 sgt yo ) ; 
k=l 


2 
Proof: By (3), [i x*ax = ik l+xInx+ ater 


x?(Inx)?  x*(n x)* 


2x3 2x3x4 


-|dx 


=| dx+] xInxdx+—|. x?(nx)*dx 
Jrac+J, aa 


I 1 3 3 
$e gs nS) 
1 


+ ——— | x*(nx)*dx + - 
2X3 ral ex 
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where Bernoulli replaced the integral of the series by the series of the inte- 


grals without blinking an eye. Using the formulas from his table, he con- 
tinued: 


1 1 
fleas af+(bcinx-te| 
0 aa 4 


1 


0 


ae Phi eit 
2\3 9 


27 5 
+ = Dh xy = eee x)? 
2x3\4 16 
6 6 24h 
$e lane 
64 256 


die Ee ee x)* mee x) 
2xX3x4\5 25 


12 2 2 
ene, eee ee x? 
125 625 3125 


Here he observed that upon substituting x = 1, “all terms in which are 
found lx, or any power . . . of the natural logarithm vanish, insofar as the 
logarithm of unity is zero” [22]. This is fine, but a modern reader may 
be puzzled that no mention was made about substituting x = 0 to produce 
indeterminate expressions like 0(In 0)". Today, we would apply ’Hospi- 


tal’s rule (a most fitting choice!) to show that lim x”(nx)" = 0. 
x0 
In any case, after so many terms had vanished, Bernoulli was left with 


i ¥ 1 lf 2 1 6 1 24 
x*dx =1-—+ - + see 
" 4 Q\2T7 ) 23) 256); 23 64\ 3125 


i 1 1 1 

+ a + es 
4 27 256 3125 

1 it 1 1 
32 + 33 = ad aw 5) 


=l]- 


Seige QED. 


It is quite remarkable that this series gives the area beneath the curve 
y =x~* over the unit interval. Beyond its splendid symmetry and immediate 
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visual appeal, it has another attribute not lost on Johann. He noted, “This 
wonderful series converges so rapidly that the tenth term contributes only 
a thousandth of a millionth part of unity to the sum” [23]. To be sure, it 


takes only a handful of terms to calculate ke x“dx = 0.7834305107 accu- 
rately to ten places. 

As the examples in this chapter should make clear, Jakob and Johann 
Bernoulli were worthy disciples of Gottfried Wilhelm Leibniz. In their 
hands, his calculus became, as we might say today, “user-friendly.” The 
brothers left the subject in a more sophisticated yet much more under- 
standable state than they found it. 

And Johann had one other legacy. In the 1720s, he mentored a young 
Swiss student of almost limitless promise. The student’s name was Leon- 
hard Euler, and we sample his work next. 


CHAPTER 4 


ti 


Euler 


Leonhard Euler 


n any accounting of historys greatest mathematicians, Leonhard 
Euler (1707-1783) stands tall. With broad and inexhaustible interests, 
he revolutionized mathematics, extending the boundaries of such well- 
established subdisciplines as number theory, algebra, and geometry even 
while giving birth to new ones like graph theory, the calculus of variations, 
and the theory of partitions. When in 1911 scholars began publishing his 
collected works, the Opera omnia, they faced a daunting challenge. Today, 
after more than seventy volumes and 25,000 pages in print, the task is not 
yet complete. This enormous publishing project, consuming the better part 
of a century, bears witness to a mathematical force of nature. 

That force was especially evident in analysis. Among Euler’ collected 
works are eighteen thick volumes and nearly 9000 pages on the subject. 
These include landmark textbooks on functions (1748), differential calculus 
(1755), and integral calculus (1768), as well as dozens of papers on topics 


52 


EULER 53 


ranging from differential equations to infinite series to elliptic integrals. As a 
consequence, Euler has been described as “analysis incarnate” [1]. 

It is impossible to do justice to these contributions in a short chapter. 
Rather, we have selected five topics to illustrate the sweep of Euler’ 
achievements. We begin with an example from elementary calculus, fea- 
turing the bold—some may say reckless—approach so characteristic of 
his work. 


A DIFFERENTIAL FROM EULER 


In his text Institutiones calculi differentialis of 1755, Euler presented the 
familiar formulas of differential calculus [2]. These depended upon the 
notion of “infinitely small quantities,” which he characterized as follows: 


There is no doubt that any quantity can be diminished until it all 
but vanishes and then goes to nothing. But an infinitely small 
quantity is nothing but a vanishing quantity, and so it is really 
equal to 0... . There is really not such a great mystery lurking in 
this idea as some commonly think and thus have rendered the 
calculus of the infinitely small suspect to so many. [3] 


For Euler, the differential dx was zero: nothing more, nothing less—in 
short, nothing at all. The expressions x and x + dx were therefore equal 
and could be interchanged as the situation required. He observed that “the 
infinitely small vanishes in comparison with the finite and hence can be 
neglected” [4]. Moreover, powers like (dx)* or (dx)? are infinitely smaller 
than the infinitely small dx and likewise can be jettisoned at will. 

It was often the ratio of differentials that Euler sought, and determining 
this ratio, which amounted to assigning a value to 0/0, was the mission of 
calculus. As he put it, “the whole force of differential calculus is concerned 
with the investigation of the ratios of any two infinitely small quantities” [5]. 

As an illustration, we consider his treatment of the function y = sin x. 
Euler began with Newton’ series (where we employ the modern “factori- 
al” notation): 


2 2 z! 

sin 7 = z7-—+—- — and 
36S}! 
2 zt 2 


cosz= 1-—+—-—-—+---. (1) 
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Substituting the differential dx for z, he reasoned that 


3 5 7 
_ xy GY _ 


sin(dx) = dx - and 
3! 5! 7! 
2 4 6 
cos(dx) = 1—- ad, + se. she sii 
2! 4! 6! 


Because the higher powers of the differential are insignificant compared to 
dx or to constants, these series reduced to 


sin(dx)=dx and cos(dx)=1. (2) 


In the equation y = sin x, Euler replaced x by x + dx and y by y + dy 
(which for him changed nothing) and employed the identity sin(@ + B) = 
sin @ cos B+ cos a sin B and (2) to get 


y + dy =sin(x + dx) = sin x cos(dx) + cos x sin(dx) = sin x + (cos x)dx. 


Subtracting y=sinx from both sides, he was left with dy=sinx+ 

(cos x)dx — y = (cos x) dx, which he turned into a verbal recipe: “the differ- 

ential of the sine of any arc is equal to the product of the differential of the arc 

and the cosine of the arc” [6]. It follows that the ratio of these differentials— 
dy _ (cos x)dx 


what we, of course, call the derivative—is i ce = cos x. Noth- 
ing to it! 


AN INTEGRAL FROM EULER 
Euler was one of history’s foremost integrators, and the more bizarre 


the integrand, the better. His works, particularly volumes 17, 18, and 19 
of the Opera omnia, are filled with such nontrivial examples as [7]: 


i. (In x)? ax 312° 


0 l+x 252 
—— _ 
0 x 2’ 


0 Inx 


[ sin(p In x) - cos(qIn x) Ai : vst 2p } 
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This last features a particularly rich mixture of transcendental func- 
tions. 

As our lone representative, we consider Eulers evaluation of 
[' sin(In x) 
9 Inx 
infinite series when possible. From (1), he knew that 


_ (nx)? , iin x) (nx)! ae 


dx [8]. To begin, he employed a favorite strategy: introduce an 


sin(In x) = ins 3) 5) 7\ 
Inx Inx 
2 4 6 
_,_ dnx) , dnx) _ (nx) sea. 


3} 5! 7! 


Replacing the integral of the infinite series by the infinite series of inte- 
grals, he continued: 


isindnx), — fl lpi 3 lp 4 
[pax = pax — SJ dn dx + = J cn x) 
lp 6 
— 5 dn xd +>, (3) 


i 

Integrals of the form I, (In x)"dx are reminiscent of Johann Bernoulli’s 
formulas from the previous chapter, and Euler instantly spotted their recur- 
sive pattern: 


1 

{ (In x)*dx = [x(n x)* -2xInx+ 2x1) = 2 = 21, 

[i (In x)*dx = [dn x)* — 2x(In x)? + 12x(In x)” 
-24xIn x+ 24x] = 24 = 41, 


{ (In x)°dx = 720 = 6!, and so on. 


As noted in the previous chapter, lim x(nx)" =0, which explains the 
x0 

disappearance of terms arising from substituting zero for x in these anti- 

derivatives. 


When Euler applied this pattern to (3), he found that 


(ee pie = Bie) == ool ae 
0 Inx 3) 5! 7! 


Lash Lf. 
=]-=+=-H+-----. 
3 D> of 0 
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This, of course, is the Leibniz series from chapter 2, so Euler finished in 
style: 


[ sin(In x) ae um 
0 Inx 4 


The derivation shows that Euler—like Newton, Leibniz, and the 
Bernoullis before him—was a spectacular (and fearless!) manipulator of 
infinite series. In fact, one could argue, based on the mathematicians seen 
thus far, that a high comfort level in working with infinite series defined an 
analyst in these early days. 

The appearance of z in the integral above leads us directly to the next 
topic: Euler's techniques for approximating this famous number. 


EULER’S ESTIMATION OF 7 


By definition, 7 is the ratio of a circle’s circumference to its diameter. 
From ancient times, people recognized that the ratio was constant from 
one circle to another, but attaching a numerical value to this constant has 
kept mathematicians busy for centuries. 

As is well known, Archimedes approximated z by inscribing (and cir- 
cumscribing) regular polygons in (and about) a circle and then using the 
polygons’ perimeters to estimate the circle’s circumference. He began with 
regular inscribed and circumscribed hexagons and, upon doubling the 
number of sides to 12, to 24, to 48, and finally to 96, he showed that “the 
ratio of the circumference of any circle to its diameter is less than 34 > but 
greater than Srl s ” [9]. To two-place accuracy, this means 17~= 3.14. 

Subsequent mathematicians, whose number system was computa- 
tionally simpler than that available in classical Greece, exploited his idea. 
In 1579, Francois Viéte (1540-1603) found z accurately to nine places 
using polygons with 6 x 2'© = 393,216 sides. This geometrical approach 
reached a kind of zenith (or nadir) in the work of Ludolph van Ceulen 
(1540-1610), who used regular 2°7-gons to calculate z to 35 decimal 
places in a phenomenal display of applied tedium that reportedly con- 
sumed the better part of his life [10]. 

Unfortunately, each new approximation in this process required tak- 
ing a new square root. The estimate of a generated by Archimedes’ 
inscribed 96-gon was 


48,/2- 2+ Ja+f2+3 > 
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an expression that is a treat to the eye but a nightmare to the pencil. Yet 
after these five square root extractions, we have only two-place accuracy. 
Worse was Viéte’s nesting of seventeen square roots for his nine places 
of accuracy, and unthinkably awful was Ludolph’s approximation featur- 
ing five dozen nested radicals, each calculated to thirty-five places— 
by hand! Euler compared such work unfavorably to the labors of 
Hercules [11]. 

Fortunately, there was another way. As we mentioned in chapter 2, 
James Gregory discovered the infinite series for arctangent: 


a aes 


arctanx = x—-—+4+— -— 4+...., (4) 
3 > 7 


1 hall 
For x=1, this becomes Leibniz’s series ri = arctan(1) = 1— 3 + er 


1 ol 
a ie which, as we observed, is of no value in approximating z 


because of its glacial rate of convergence. 


However, if we substitute a value of x closer to zero, the convergence 


1 
is more rapid. For instance, letting x = 3B in (4), we get 


= = arctan{ 4) 
6 V3 
1 1 1 1 
“8B  GV3x3 OVDxS QIW3DXT. 


so that 


me: Let _—e 
V3{ 3x3. 9x5 -27x7 
This is an improvement over the Leibniz series because its denominators 
1 
are growing much faster. On the other hand, B = 0.577, which is not all 


that small, and this series involves a square root that itself would have to 
be approximated. 

For a mathematician of the eighteenth century, the ideal formula 
would use Gregory’s infinite series with a value of x quite close to zero while 
avoiding square roots altogether. This is precisely what Euler described in a 
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1779 paper [12]. His key observation, which at first glance looks like a 
typographical error, was that 


m= 20 arctan(1/7) + 8 arctan(3/79). (5) 


Improbable though it may seem, this is an equation, not an estimate. Here 
is how Euler proved it. 


tan a — tan 
He started with the identity tan(@ — B) = B 


, which 
1+ (tan a)(tan B) 


tana — tan B 
1+ (tana)(tan B) 


x 
can be recast as @ — B = arctan } Euler let tana = — 
y 


Zz 
and tan B = — to get 
w 


= 
arctan| — | — arctan] — | = arctan] —~—_—~ |, 
w 
y)w 
or simply 


arctan] = arc §] + rca) =| (6) 
y w yw + xz 


He then substituted a string of cleverly chosen rationals. First, Euler 
1 1 
set x=y=z=1 and w=2 in (6) to get ri = arctan(1) = actan{ 5 + 


ai 
arctan i , so that 


1 1 
ma=4 arctan{ >) +4 arcan{ 3 (7) 


He could have stopped there, using (7) to approximate 7 via Gregory's 
arctangent series, but the input values of 1/2 and 1/3 were too large to give 
the rapid convergence he desired. Instead, Euler returned to (6) with x = 1, 
y =2,z= 1, and (for reasons not immediately apparent) w = 7. This led to 


arctan(1/2) = arctan(1/7) + arctan(5/15) = arctan(1/7) + arctan(1/3), 
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which, when substituted into (7), gave the new expression 


m = 4[arctan(1/7) + arctan(1/3)| + 4 arctan(1/3) 
= 4 arctan(1/7) + 8 arctan(1/3). (8) 


Next, Euler chose x = 1, y=3, z= 1, and w=7 to conclude from (6) that 
arctan(1/3) = arctan(1/7) + arctan(2/11). This he substituted into (8) to get 


m= 12 arctan(1/7) + 8 arctan(2/11). (9) 


In a final iteration of (6), Euler let x =2, y=11, z=1, and w=7 so that 
arctan(2/11) = arctan(1/7) + arctan(3/79), which in turn transformed (9) 
into the peculiar result stated in (5): 


m= 12 arctan(1/7) + 8 [arctan(1/7) + arctan(3/79)| 
= 20 arctan(1/7) + 8 arctan(3/79). 


This expression for z is admirably suited to the arctangent series in 
(4), for it is free of square roots and uses the relatively small numbers 1/7 
and 3/79 to produce rapid convergence. With only six terms from each 
series, we calculate 


m = 20arctan d + 8arctan Ef 
it 79 


3 5 7 9 le 
=! wm an _ an an _ wn | 
7 & 5 7 9 11 


3 G/TIP _ B/7T9P _ B79)! 
79 3 5 7 


A (3/79)? = (3/79) 
9 ll 


=3.14159265357. 


Here, a dozen fractions provide an estimate of z accurate to two parts in a 
hundred billion, a better approximation than Viéte obtained by extracting 
seventeen nested square roots. In fact, Euler claimed to have used such 
techniques to approximate 7 to twenty places, “and all this calculation 
consumed about an hour of work” [13]. 
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Recalling the lifetime that poor Ludolph devoted to his bewildering 
tangle of square roots, one is tempted to change Euler’s nickname to “effi- 
ciency incarnate.” 


SPECTACULAR SUMS 


In this section we shall see how Euler, by analyzing a single situation, 
was able to find the exact values of 


aes =| k+l ii 1 1 
> (“D =] + +---(Leibniz’s series), 
2k -1 QO yD. uf 


2 u =l+ u + : + : +---(Jakob Bernoulli’s challenge), 
ke 4 9 16 
=) 1" 1 1 l 
> = + +--+, and many more. 
= Gey 27 125 343 


By unifying these sums under one theory, Euler cemented his reputation 
as one of history’s great series manipulators. 

The story begins with a result from his 1748 text, Introductio in 
analysin infinitorum. 


Lemma: If P(x) = 1 + Ax + Bx? + Cx? +--+» =(1 + a,x)(1 + &x) 
(1+ a3x)..., then 


ym, =A, 

>) a, = A* - 28, 

>) oj, = A® - 3AB+ 3C, 

>) og = At -4A°B + 4AC + 2B — 4D, and so on, 


whether these factors be “finite or infinite in number” [14]. 
Proof: Euler observed that such formulas were “intuitively clear,” but prom- 


ised a rigorous argument using differential calculus. This appeared in a 
1750 paper on the theory of equations [15]. 
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Before proving the lemma, we should clarify its meaning. Setting 0 = 
P(x) = (1 + @,x)(1 + ax)(1 + ax)..., we solve for x=—l/a,, —l/ay, 
—l/a3,.... The lemma thus connects the coefficients A, B, C,... in the 
expression for P to the negative reciprocals of the solutions to P(x) = 0. In 
this light, the result seems to be primarily an algebraic one. 

But Euler, the great analyst, saw it differently. He started by taking 
logarithms: 


In[P(x)] = In[1 + Ax + Bx? + Cx? +--+] 
=In|(l + .@x)(1 4-043) + @x)..08] 
=In(] + ax) + In + ox) + Ind + @x)to+, 


Then, making good on his promise to use calculus, he differentiated both 
sides to get 


A+ 2Bx + 3Cx? + 4Dx? +--+: o o a 
- = = = bh as, CTO) 
1+ Ax+ Bx? +Ox?t--: l+a@x l+a,x 1+a3x 
It was evident to Euler that each fraction —“k~ on the right-hand side 
OLX 


was the sum of an infinite geometric series with first term @, and common 
ratio — a,x. That is, 


o4 

1 _ = @, —ajxtapx? —afx?+---, 
1+ ax 

04 

2 = @, — ab xt 03x? —agxet+---, 
1+ x 

Oy 2 342 ted 

= 0, — 03x + 03x" — Ox” +---, and soon. 

1+ 03x 


Upon adding down the columns of this array and summing like powers of 
o%,, he rewrote equation (10) as 


A+ 2Bx + 3Cx? +4Dx? +--- 
1+ Ax + Bx? + Cx? t-:- 


=) a, - (> ar )x + > ar) x? _ (> ait)x? i 
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This he cross-multiplied and expanded to get 


A+ 2Bx + 3Cx? + 4Dx° +--- 
= [1+ Ax + Bx*+Cx? +--+] 


[dan — (Lat }e+ (Mark? - (Lag) +] 
=Yia.+ [Ada ~¥ af]x + [BY a, - AY af + ¥ a9] x? 
+[C>i oy = BY ag + AY of - Yaz|x?+--. 


From here, Euler equated coefficients of like powers of x and so determined 
» Oy, recursively: 


(a2) a, =A, 
(b) [AY @, — >! a] = 2B, and so 
> a, = [AY , — 2B] = A? - 28, 
(c) BY) Oh, — Ay oF +) a; = 3C, and so 


Yu = AY aj, - BY a, + 3C 


= A[A? —2B]— AB+3C = A° —3AB+ 3C, 
(d) CY a, -B) a +A} on - Via = 4D, and so 
>) a, = At — 4A7B + 4AC + 2B? — 4D. 


The process can be continued at will. In this way, by combining loga- 
rithms, derivatives, and geometric series, Euler proved his “intuitively 
clear” formulas! Q.E.D. 


To demonstrate their relevance, he considered the general expression 
1 mx). (a 
P(x) = cos] —— x | +] tan — |sin] — x | although we here restrict our 
2n 2n 2n 


attention to the case where m = 1 and n= 2 [16]. That is, we consider 


P(x) = cos = :| + [an “| sin (= *| — cos = :| + sin (= «}. 
4 4} \4 4 4 
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To apply the lemma, we must write P as an infinite series and as an 
infinite product of factors of the form (1 + a,x), where —1/o,, is a root of 
P(x) = 0. The former is easy, for we need only shuffle together the series 
for cosine and sine from (1) to get 


2 
710 2 a? 3 


a 
P(x)=1+—x x x 
4 4.2) 4°3! 


a ee 


+ + 
ata 455 


We thus identify the coefficients from the lemma as 


A= &/4, B=—n7/32, C =—127/384, D= 14/6144, .... 


1 1 
On the other hand, setting 0 = P(x) = cos| = x] + sin [< <| leads to 


1 
tan —x =-—1, whose roots are x =—1, 3, —5, 7, —9,.... The negative 


reciprocals of these roots will be the o, from the lemma, so that 
a, =1, 0, =—-1/3, 0,= 1/5, 4,=—-1/7, a5=1/9,.... 


At last Euler could reap his rewards. According to the lemma, ye, =A 


1 1 1 si T 
and so l + + -+-=—. Here we have the Leibniz series 
3 5 7 9 4 


making a return appearance. Note that in contrast to Leibnizs complicated, 
geometric derivation from chapter 2, Euler’ was purely analytic with no 
evident triangles, curves, or graphs. 


The lemma’s second relationship was »» 0, = A* — 2B, which for our 
specific function P provides the sum of reciprocals of the odd squares: 


oe oe ee x) nm) x? 
9 25 49 81 *t a2 8 
From this, Euler could easily answer Bernoulli question about the 
sum of the reciprocals of all the squares, because 


betel ote tty 
4. 9 16 25 36 49 


1 l,1.1 ee ee ae | 
=|l+—+—+—+—+4---|/+-]14+—4+-—4+—+—+4---] 
9°25 49 81 ) “| 4°9 16 25 
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ener : + : i : a d + J=[4d+de 


4 9 16 25 36 49 


1 1 1 1 1 1 
=> and so 1+—+—+——+=—+——+-—+::-= 
40" 81” 4. 9 16 25 36 49 
4 
3 x e i The resolution of Bernoullis challenge was another feather 


in Euler’s feather-laden hat. 
The next equation from the lemma, > a = A’? —3AB+3C , yielded 
the alternating series: 


1 1 1 il 
+ + 
27 125 343 729 


-(2) -(-S)(-4)-5. 


And on he went, using the lemma repeatedly to derive such formulas 


- aye ~ 90" me) ae a ~ 1536 


achievement calls to mind Ivor Grattan-Guinness’ observation that “Euler 
was the high priest of sum-worship, for he was cleverer than anyone else 
at inventing unorthodox methods of summation” [17]. It goes without 
saying that the high priest was agnostic about subtle convergence ques- 
tions accompanying his proof. Such matters would have to await the next 
century. 

One other ae fact leaps ee the page. Although Euler had a 


l n* y (a! 51° 


and many more. This spectacular 


expressions like ye ye and SS Bf? he did not explicitly sum y 3 or 
fai k= ik 

other series ah i exponents. The value of such quantities, wrote 
Euler, “can be expressed neither by logarithms nor by the circular periph- 
ery 7, nor can a value be assigned by any other finite means” [18]. At one 
point, stumped by this vexing problem, an apparently frustrated Euler 
conceded that it would be “to no purpose” for him to investigate further 
[19]. It says something for his analytic intuition that to this day the 
nature of these odd-powered series remains far from clear. One suspects 
that if Euler failed to find a simple solution, it does not exist. 

We conclude with one other significant contribution to analysis: 
Euler’ ideas on extending factorials to noninteger inputs. 
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THE GAMMA FUNCTION 


An interesting mathematical exercise is to interpolate a formula involv- 
ing whole numbers. That is, we seek an expression, defined across a larger 
domain, that agrees with the original formula when the input is a positive 
integer. 

By way of clarification, consider the following example discussed by 
Philip Davis in an article on the origins of the gamma function [20]. For 
any positive integer n, we let Sin) =1+2+3+---+n be the sum of the 
first n whole numbers. Clearly, S(4) = 1+2+3+4= 10. It would make 
no sense, however, to talk about the sum of the first four-and-a-quarter 
numbers. 

To make that leap, we introduce a function T defined for all real x by 


x(x+1]) 


T(x) = . Here T interpolates S, for when n is a whole number, 


n(n +1) 


S(n)=14+24+34+---4+n= =T(n). But now we can evaluate 


T(4.25) = 11.15625. In this way, the function T “fills the gaps” in our 
representation of S, or, as Davis put it, “the formula extends the scope of 
the original problem to values of the variable other than those for which it 
was originally defined.” 

In fact, this is what Newton did with his generalized binomial 
expansion. Rather than restrict himself to whole number powers of 
(1+x)", he dealt with fractional or negative exponents in a way that 
matched, that is, interpolated, the familiar situation when n was a posi- 
tive integer. 

In 1729, the ever-curious Euler took up an analogous challenge for the 
product of the first n whole numbers. That is, he sought a formula defined 
for all positive real numbers that agreed with 1 - 2-3.-...-n when the 
input n was a positive integer. To use modern terminology, Euler sought to 
interpolate the factorial. 

His first solution appeared in a letter to Christian Goldbach from 
October of 1729 [21]. There, he proposed the bizarre-looking infinite 
product 


1-2* zi-*x “ae ginx .4* 4'-*x 15% 
x — K ——_ K ——_ * 


(11) 
l+x 2+x 34+x 44+x 


66 CHAPTER 4 


At different times, Euler denoted this expression by A(x) and by [x]. For 
the remainder of the chapter, we shall use the latter. From (11) one sees 
that 


Wed Ae 3 lees le 
x x x x: 


[1] = oh 
2 3 4 iD) 
bi te ey fad) 
3 2:4 3:5 4-6 
1-2:2-2 3-3-3 4-4-4 5-5-5 6-6-6 
[3] = x x x x X-+-=6, 
4 2:2°5 3-3-6 4-4-7 5-5-8 


and so on, where the infinitude of cancellations serves to obscure ques- 
tions of convergence. Nonetheless, this infinite product seems to do the 
trick: ifn is a whole number, then [n] = n!. 

And [x] allows gap-filling. We can consider, for instance, [1/2], which 
is the value that should be assigned to the interpolation of (1/2)!. When 
Euler substituted x = 1/2, he got 


1} _lv2 ) v2-v3 V3 V4 V4-V5 
2 3/2 5/2 7/2 9/2 
4 4-6 6-8 8-10 
= x x x 


3S Ded ff 929 


Something about the expression under the radical looked familiar. He recalled 
a 1655 formula due to John Wallis, who, using an arcane interpolattion 
SoBe 5e5+ 7s FO Oe. A 
21404165658 +8-10+... 2 


procedure of his own, had shown that 


[22]. With this, Euler deduced that 


1 x 1 
a et 
fE 2 
; : ne 
We are thus forced to conclude that the “natural” interpolation of >| is 


the very unnatural 5 /x. That in itself deserves an exclamation point. 


This answer provided Euler with a valuable clue. Because 2 appeared 
in the result, he surmised that a connection to circular area may lay 
somewhere beneath the surface, and this, in turn, suggested that he 
direct his search towards integrals [23]. With a bit of effort, he arrived at 
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the alternative formula 
[x= |} inode (12) 


This result is far more compact than (11) and much more elegant. The 
skeptic can apply equal measures of integration by parts, Hospital’ rule, 
and mathematical induction to confirm that, when n is a whole number, 
[node =n. 

Once he had an integral to play with, Euler was in his element. After a 
few more mathematical gyrations, he found that (see [24]) 


El yee /tae 


x? dx _= 
A bit of elementary calculus shows that i — and 
0J1— x? 
1 xdx . nue 
f, <= = 1, so here is another confirmation—this time without resorting 
-—x 


1 1 
to Wallis’s formula—that [1/2] = if = sv 


Euler also recognized that [x] =x - [x — 1], a relationship he exploited 


2) 5 | 3 ig 3 1 alee) 
to the hilt in deriving results like =—X =—a 
2 2 |2 “3° ia 2 8 


[25]. Then, always a true believer in the persistence of patterns, he pushed 


1 1 1 
the recursion in the other direction to get =|- +x|-5 and so 


1 
- 3|- 2x|3|- x. In other words, (- ;} should be interpreted as 


m. By now it should be evident that intuition has a long way to go to 
catch up with calculus. 

Modern mathematicians tend to follow a modification of Euler’ ideas 

popularized by Adrien-Marie Legendre (1752-1833). Legendre substituted 


Os ge! es 
y =—Int into (12) to get [x] = -[P ye ay = ih y*e ~dy and then shifted 
the input by one unit to define the gamma function by 


Px) = [x-U = [ye Pay. 


It is worth noting, however, that this very integral shows up in Eulers 
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writings as well [26]. 

Of course, the gamma function inherits properties that Euler had dis- 
covered about [x], such as the recursion P(x + 1) = xI(x) or the remarkable 
identity [C/2) =[-1/2] = zx. It is a function that seems to appear any- 
where sophisticated mathematical analysis is practiced, from probability 
to differential equations to analytic number theory. Nowadays, the gamma 
function is regarded as the first and perhaps most important of the “high- 
er functions” of analysis, that is, those whose very definition requires the 
ideas of calculus. It occupies a place beyond the algebraic, exponential, or 
trigonometric functions that characterize elementary mathematics. And 
we owe it, like so much else, to Euler. 

The results of this chapter—be they differentials or integrals, approxi- 
mations or interpolations—reveal an astonishing ingenuity. Von Neumann 
called Euler “the greatest virtuoso of the period,” for he posed the right 
questions and, with an agility and intuition that continue to amaze, regu- 
larly found the right answers [27]. Without doubt, Euler was at home in 
analysis, the perfect arena in which to apply what seemed to be his infor- 
mal credo: Follow the formulas, and they will lead to the truth. 

No one ever did it better. 


CHAPTER 5 


wr 


First Interlude 


basal Euler died in 1783, one year short of the centennial of 
Leibniz’s first paper on differential calculus. By any measure, it had been a 
remarkable century in the history of mathematics. The results considered 
thus far, although a tiny fraction of the century’ output, illustrate the 
progress that had been made. Grappling with infinite processes to discover 
correct and sometimes spectacular results, Newton, Leibniz, the Bernoullis, 
and Euler had established calculus as the mathematical subdiscipline par 
excellence. Our hats are off to these great originators. 

An important trend of that first century was a shift in perspective from 
the geometric to the analytic. As the problems became more challenging, 
their solutions depended less on the geometry of curves than on the alge- 
bra of functions. The complicated diagrams that Leibniz used to prove his 
transmutation theorem in 1673 had no counterpart in Euler's work from 
the middle of the eighteenth century. In this sense, analysis had assumed a 
more modern look. 

But other familiar aspects of the subject were nowhere to be seen. 
Largely missing, for instance, was that bulwark of modern analysis, the 
inequality. Seventeenth and eighteenth century mathematicians dealt 
mainly in equations. Their work tended to employ clever substitutions that 
transformed one formula into another so as to emerge with the desired 
answer. Although Jakob Bernoulli divergence proof of the harmonic 
series (see chapter 3) featured a deft use of inequalities, such an approach 
was rare. 

Rare as well was the study of broad classes of functions. Euler and his 
predecessors were adept at looking at specific integrals or series, but they 
were less interested in common properties of, say, continuous or differen- 
tiable functions. A shift in focus from the specific to the general would be 
a hallmark of the coming century. 

One other striking difference between early calculus and that of today 
is the attention given to logical foundations. As we have seen, mathemati- 
cians of the period used results whose validity they had neither proved nor, 
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in many cases, even considered. An example was the tendency to replace 
the integral of an infinite series by the infinite series of integrals, that is, to 


= =| fe 
equate [ YD fp) dx and >If fede Both operations here— 
k=l k=] 


integrating functions and summing series—involve infinite processes 
whose uncritical interchange can lead to incorrect results. Certain condi- 
tions must be met before a reversal of this sort is appropriate. On this 
front, the calculus pioneers operated more on intuition than on reason. 
Admittedly, their intuition was often very good, with Euler in particular 
possessing an uncanny ability to know just how far he could go before 
plunging into the mathematical abyss. 

Still, the foundations of calculus were suspect. As an illustration, we 
recall the role played by infinitely small quantities. Attempts to explain 
these so-called infinitesimals—and everyone from Leibniz to Euler gave it 
a shot—never proved satisfactory. Like a mathematical chameleon, infini- 
tesimals seemed inevitably to be both zero and nonzero at the same time. 
At rock bottom, they were paradoxical, counterintuitive entities. 

Nor were things much better when mathematicians based their con- 
clusions on “vanishing” quantities. Newton was a proponent of this 
dynamic approach, a fitting position, perhaps, for one so captivated by the 
study of motion. Introducing what we now call the derivative, he consid- 
ered a quotient of vanishing quantities and wrote that, by the “ultimate 
ratio” of these evanescent quantities, he meant “the ratio of the quantities 
not before they vanish, nor afterwards, but with which they vanish” [1]. 
Besides conjuring up the notion of a quantity after it vanishes (whatever 
that means), Newton asked his readers to imagine a ratio at the precise 
instant when—poof!—both numerator and denominator simultaneously 
dissolve into thin air. His description seemed ripe for criticism. 

It was not long in coming, and the critic was George Berkeley 
(1685-1753), noted philosopher and Bishop of Cloyne. In his 1734 essay 
The Analyst, Berkeley ridiculed those scientists who accused him of pro- 
ceeding on faith and not reason, yet who themselves talked of infinitely 
small or vanishing quantities. To Berkeley this was at best fuzzy thinking 
and at worst hypocrisy. The latter was implied in the long subtitle: 


A Discourse Addressed to an Infidel Mathematician, wherein It Is 
Examined Whether the Object, Principles, and Inferences of the 
Modern Analysis Are More Distinctly Conceived, or More Evi- 
dently Deduced, than Religious Mysteries and Points of Faith [2] 
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Berkeley's essay was caustic. Whether the calculus was built upon 
Newton’ vanishing quantities or Leibniz’ infinitely small ones made little 
difference to the bishop, who concluded that, “The further the mind analy- 
seth and pursueth these fugitive ideas, the more it is lost and bewildered” 
[3]. When skewering Newton, Berkeley penned the now famous question: 


And what are these fluxions? The velocities of evanescent incre- 
ments? And what are these same evanescent increments? They are 
neither finite quantities nor quantities infinitely small, nor yet noth- 
ing. May we not call them the ghosts of departed quantities? [4] 


He was no kinder to Leibniz’ infinitesimals. Admitting that the notion 
of an infinitely small quantity was “above my capacity,” he mockingly 
observed that an infinitely small part of an infinitely small quantity, for 
instance, (dx)*, presented “an infinite difficulty to any man whatsoever” [5]. 

Berkeley did not dispute the conclusions that mathematicians had 
drawn from these suspect techniques; it was the logic behind them that he 
rejected. True, the calculus was a wonderful vehicle for finding tangent lines 
and determining maxima or minima. But he argued that its correct answers 
arose from incorrect thinking, as certain mistakes cancelled out others in a 
compensation of errors that obscured the underlying flaws. “Error,” he 
wrote, “may bring forth truth, though it cannot bring forth science” [6]. 

We illustrate Berkeley’ point with his example, using modern notation, 


of finding when y =x”. In the fashion of the day, he began by augmenting 


x with a tiny, nonzero increment o and developing the differential quotient 


3 n(n—-1) ,_ z 
( yn x? Hi 9 Ee eo 
xt oy x" 2 
0 0 
_ nn-l) , : 2 
= nx") +——~ x"o4---+nxo"* +077 


2 


Up to this point, 0 was assumed to be nonzero, a supposition, Berkeley 
stressed, “without which I should not have been able to have made so 
much as a single step.” But then o suddenly became zero, so that 


dy = nx"! 
dx 
Berkeley objected that the second assumption was in absolute conflict 


with the first and consequently negated any conclusions derived here. 
After all, if o is zero, not only are we forbidden to put it into a denominator, 
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but we must concede that x was never augmented at all. The argument 
collapses in a heap. “When it is said, let the increments vanish,” wrote 
Berkeley, “the former supposition that the increments were something .. . 
is destroyed, and yet a consequence of that supposition, i.e., an expression 
got by virtue theoreof, is retained” [7]. 

To the Bishop, such a method of reasoning was wholly unsatisfactory 
and represented “a most inconsistent way of arguing, and such as would 
not be allowed of in Divinity” [8]. In one of The Analyst's most searing pas- 
sages, Berkeley compared the faulty logic of calculus to the high standards 
that are required “throughout all the branches of humane knowledge, in 
any other of which, I believe, men would hardly admit such a reasoning as 
this which, in mathematics, is accepted for demonstration” [9]. 

Bishop Berkeley had made his point. Although the results of calculus 
seemed to be valid and, when applied to real-world phenomena like 
mechanics or optics, yielded solutions that agreed with observations, 
none of this mattered if the foundations were rotten. 

Something had to be done. Over the next decades a number of mathe- 
maticians tried to shore up the shaky underpinnings. Among these was 
Jean-le-Rond d’Alembert (1717-1783), a highly respected scholar who 
worked alongside Diderot (1713-1784) on the Encyclopédie in France. 
Regarding the foundations of calculus, d’Alembert agreed that infinitely 
small and/or vanishing quantities were meaningless. He proclaimed, without 
equivocation, that “a quantity is something or nothing; if it is something, it 
has not yet vanished; if it is nothing, it has literally vanished. The supposi- 
tion that there is an intermediate state between these two is a chimera” [10]. 

As an alternative, d'Alembert proposed that calculus be based upon 


the concept of limit. In treating the derivative, he identified - as the limit 


of a quotient of finite terms, which he wrote as ~ but which we recognize 
u 


y(x + Ax) — y(x) 
as 
Ax 


2 is “the quantity to which the ratio z/u 


. Then, 


approaches more and more closely if we suppose z and u to be real and 
decreasing. Nothing is clearer than this” [11]. 

D’Alembert was onto something. He had no use for infinitesimals nor 
vanishing quantities and deserves credit for highlighting limits as the way 
to repair the weak foundations of the calculus. 

But it would be going too far to assert that d’Alembert saved the day. 
Although he may have sensed the right path, he did not follow it very far. 
Missing was a clear definition of “limit” and the subsequent derivation of 
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basic calculus theorems from it. In the end, d’Alembert did little more 
than suggest the way out of trouble. A full development of these ideas 
would have to wait a generation and more. 

Meanwhile, a greater mathematician weighed in on the matter and 
offered a very different solution. He was Joseph-Louis Lagrange (1736- 
1813), a powerful and influential figure in European mathematics as the 
eighteenth century wound down. On the question of foundations, Lagrange 
vowed to provide a logically sound framework upon which the great edi- 
fice of calculus could be built. In his 1797 work Théorie des fonctions analy- 
tiques, he envisioned a calculus “freed from all considerations of infinitely 
small quantities, vanishing quantities, limits and fluxions” [12]. Seeing no 
merit in any of the past justifications, Lagrange vowed to start anew. 

His fundamental idea was to regard infinite series not as the output 
but as the source of differential calculus. That is, beginning with a function 
f(x) whose derivative he sought, Lagrange expressed f(x + i) as an infinite 
series in i of the form 


foc +i) =fOd + ip) + 2q@) + Pr@) +---, (1) 


in which, as he put it, “p,q, r, . . . will be new functions of x, derived from 
the primitive function x and independent of the indeterminate i” [13]. 
Then the (first) derivative of f was no more and no less than p(x), the func- 
tion serving as the coefficient of i in this expansion. 

Anyone familiar with Taylor series can see what Lagrange was up to, 
but it is important to note that, for him, the series came first and the deriv- 
ative was a consequence, whereas in modern analysis it is the derivative 
that precedes the series. 

An example might be helpful. Suppose we want to find the derivative 


f(x) when f(x) = oe (By the way, the “f-prime” notation is due to 


1 1 
Lagrange.) Expanding the function as in (1), we have ee =—+ 
a4 x 
ip(x) + q(x) + r(x) + +++ so that 
1 1 -3x7i-3xi7 -7? 
iIpOd + iqid + Prod +---]= eae 
P q (x + i)? x3 (x a x3 
and therefore 
= De 2 «3 = 2 i 42 
p(x) + iq(x) + 7r(x) +--+ = 3x“i— 3xiT - 1 3x7 3xi Ti — Q) 


i(x + ix? (x + ix? 
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Thus, f’(x) = — 
x 


At this point, Lagrange let i= 0 in (2) to get p(x) = es 
which of course would have been no surprise to Newton or Leibniz. 

For Lagrange, this derivation avoided quantities that were infinitely 
small as well as those ghosts of departed quantities vanishing into oblivion. 
Likewise, he had no need for d’Alembert’s uncertainly defined limits. When 
Lagrange let i= 0, he meant that literally. No pitfalls were encountered in 
(2), for no zero appeared in any denominator. He regarded this as a purely 
analytic approach to the derivative, one requiring none of the logical gyra- 
tions that had embarrassed his predecessors. It was all so neat and tidy. 

Or was it? For one thing, defining derivatives in this manner is terri- 
bly indirect. The ideas of Newton and Leibniz—even if cluttered with 
curves and triangles and resting upon a shaky foundation—were at least 
straightforward in their object. Lagrange’s ideas, presented without a sin- 
gle diagram, completely obscured the fact that derivatives had something 
to do with slopes of tangent lines. 

That is a minor criticism. More troubling was the question of how to 
proceed for less trivial functions than that given above. In our example, the 


1 1 
key was to expand and simplify ——, — — in order to factor i from the 
(x+iy x 


4? 


result. But where is the guarantee that every function could be so expanded 
and simplified? Where is the guarantee that a series so constructed is con- 
vergent? And where is the guarantee that a convergent series so construct- 
ed actually converges to the function we started with? These are deep and 
important questions. 

Ultimately, the theory of Lagrange could not withstand this kind of 
scrutiny. In 1822 the French mathematician Augustin-Louis Cauchy pub- 
lished an example that proved fatal to Lagrange’s ideas. Cauchy, who will 
be the subject of our next chapter, showed that the function 


-1/x? 


fio =4e if x # 0, 
0 if x = 0, 


and all of its derivatives are zero at x =0 [14]. Consequently, as a power 
series about the origin, f«) =O0+0-x+0-x*+0-x?+---=0, which in 
turn means that, if we begin with f and write it as a series, we end up witha 
different function than we started with! As a series, we would find it impos- 
sible to distinguish between f above and the constant function g(x) = 0. 
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Cauchy’s example of two distinct functions sharing a power series indicat- 
ed that analysis was considerably less benign than Lagrange had assumed. 

In the end, a series-based definition of the derivative—and hence a 
series-based foundation for the calculus—was abandoned. But if Lagrange 
failed in his primary mission, he made a number of contributions that 
anticipated the coming century. First, he elevated foundational questions 
into greater prominence, treating them as both interesting and important 
issues. Second, he tried to derive the theorems of the calculus from his 
basic definitions, in the process introducing inequalities and exhibiting 
skill in their use. Finally, as Judith Grabiner observed in her book, The 
Origins of Cauchy’s Rigorous Calculus: 


On reading Lagrange’ work, one is struck by his feeling for the 
general... . His extreme love of generality was unusual for this 
time and contrasts with the emphasis of many of his contempo- 
raries on solving specific problems. His algebraic foundation for 
the calculus was consistent with his generalizing tendency. [15] 


All these contributions notwithstanding, the eighteenth century ended 
with the logical crisis still unresolved. The work of d'Alembert and 
Lagrange, along with others who addressed these matters, failed to mollify 
the critics. As late as 1800, the words of Bishop Berkeley carried the ring of 
truth: “I say that in every other Science Men prove their Conclusions by their 
Principles, and not their Principles by the Conclusions” [16]. 

But a resolution was near. The same Cauchy who recognized the 
nonuniqueness of series would, in the early nineteenth century, see a way 
to explain the foundations of calculus in a satisfactory manner. By the time 
he was done, analysis would be a far more general, abstract, and inequality- 
laden subject than his predecessors could have imagined. And it would be 
far more rigorous. 

It is to this towering figure, and to his revolution, that we now turn. 


CHAPTER 6 


tr 


Cauchy 


Augustin-Louis Cauchy 


ae Temple Bell, who popularized mathematicians in colorful if 
sometimes immoderate prose, wrote that “Cauchy's part in modern math- 
ematics is not far from the center of the stage” [1]. It is hard to argue with 
this judgment. During his career, Augustin-Louis Cauchy (1789-1857) 
published books and papers that now fill over two dozen volumes of col- 
lected works, and among these are treatises on combinatorics and algebra, 
differential equations and complex variables, mechanics, and optics. Like 
Leonhard Euler from the century before, Augustin-Louis Cauchy cast a 
long shadow. 

His impact upon the history of calculus is especially profound. Cauchy 
stands at a boundary between the early practitioners, who, for all their 
cleverness, occupied a more intuitive, more innocent world, and the 
mathematicians of today, for whom the logical standards are strict, perva- 
sive, and unforgiving. Cauchy did not complete this transformation, for 
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his ideas would require considerable fine tuning in the decades to come. 
But the similarity between Cauchy’ development of analysis and that of 
today’s textbooks cannot fail to impress the modern reader. 

This chapter gives a taste of Cauchy in action. We include a number of 
examples, ranging from his theory of limits to the mean value theorem 
and from his definition of the integral to the fundamental theorem of cal- 
culus, before concluding with a pair of tests for series convergence. This 
material comes from two great texts: his 1821 Cours d’analyse de l’Ecole 
Royale Polytechnique and his 1823 Résumé des lecons données a l’Ecole Royale 
Polytechnique, sur le calcul infinitésimal [2]. 


Limits, CONTINUITY, AND DERIVATIVES 


Although Cauchy recognized Lagrange as an elder statesman of math- 
ematics, he could not endorse the latter’s series-based definition of the 
derivative. “I reject the development of functions by infinite series,” wrote 
Cauchy, who continued: 


I do not ignore that the illustrious [Lagrange] has taken this for- 
mula as the basis for his theory of derived functions. But, in spite 
of the respect commanded by so great an authority, most geome- 
ters now acknowledge the uncertainty of results to which one can 
be led by use of divergent series .. . and we add that [Lagrange’s 
methods] lead to the development of a function by a convergent 
series, although the sum of this series differs essentially from the 
function proposed. [3] 


The last allusion is to Cauchy’ counterexample mentioned in the pre- 
vious chapter. For him, Lagrange’s program was a dead end. Hoping to 
provide a logically valid alternative, Cauchy asserted that “the principles 
of differential calculus, and their most important applications, can easily 
be developed without the need of series.” 

Instead, Cauchy believed that the foundation upon which all calculus 
would be built was the idea of limit. His definition of this concept is a 
mathematical classic: 


When the values successively attributed to a variable approach 
indefinitely to a fixed value, in a manner so as to end by differing 
from it by as little as one wishes, this last is called the limit of all 
the others. [4] 
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Cauchy gave the example of a circle’s area as the limit of the areas of 
inscribed regular polygons as the number of sides increases without 
bound. Of course, no polygonal area ever equals that of the circle. But for 
any proposed tolerance, an inscribed regular polygon can be found whose 
area, and those of all inscribed regular polygons with even more sides, is 
closer to that of the circle than the tolerance stipulated. Polygonal areas 
get close—and stay close—to the area of the circle. This is the essence of 
Cauchy’s idea. 

A modern reader may be surprised by his definition’s wordiness, its 
dynamic imagery, and the absence of ¢ and 6. Nowadays we do not talk 
about a “succession” of numbers “approaching” something, and we tend 
to prefer the symbolic efficiency of “e > 0” to the phrase “as little as one 
wishes.” 

Yet this was an advance of the first order. Cauchy’s idea, based on 
“closeness,” avoided some of the pitfalls of earlier attempts. In particular, 
he said nothing about reaching the limit nor about surpassing it. Such 
issues ensnared many of Cauchy’s predecessors, as Berkeley had been only 
too happy to point out. By contrast, Cauchy’s so-called “limit avoidance” 
definition made no mention whatever of attaining the limit, just of getting 
and staying close to it. For him, there were no departed quantities, and 
Berkeley’s ghosts disappeared. 

Cauchy introduced a related concept that may raise a few eyebrows. 
He wrote that “when the successive numerical values of a variable 
decrease indefinitely (so as to become less than any given number), this 
variable will be called . . . an infinitely small quantity” [5]. His use of “infi- 
nitely small” strikes us as unfortunate, but we can regard this definition as 
simply spelling out what is meant by convergence to zero. 

Cauchy next turned his attention to continuity. Intuition might at first 
suggest that he had things backwards, that he should have based the idea 
of limits upon that of continuity and not vice versa. But Cauchy had it 
right. Reversing the “obvious” order of affairs was the key to understand- 
ing continuous functions. 

Starting with y=f(x), he let i be an infinitely small quantity (as 
defined above) and considered the function’s value when x was replaced 
by x +i. This changed the functional value from y to y + Ay, a relationship 
Cauchy expressed as 


ytAy=fxti) or Ay=fx+ti)—fm). 


If, for i infinitely small, the difference Ay = f(x + i) — f(x) was infinitely 
small as well, Cauchy called f a continuous function of x [6]. In other 
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words, a function is continuous at x if, when the independent variable x is 
augmented by an infinitely small quantity, the dependent variable y like- 
wise grows by an infinitely small amount. 

Again, reference to the “infinitely small” means only that the quantities 
have limit zero. In this light, we see that Cauchy has called f continuous at 
x if lim| f(x +i)— f(x)] = 0, which is equivalent to the modern definition, 


lim f(x +i) = f(x). 
in 

As an illustration, Cauchy considered y = sin x [7]. He used the fact that 
lim(sin x)= O and the trig identity sin(@ + B) — sina = 2 sin(/2) - cos(a +/2) 


x0 
Then, for infinitely small i he observed: 
Ay = f(x + i) — fx) = sin + i) — sin x = 2sin(i/2)cos(x + i/2). (1) 


Because i/2 is infinitely small, so is sin(i/2) and so too is the entire right- 
hand side of (1). By Cauchy’ definition, the sine function is continuous at 
any x. 

We note that Cauchy also recognized one of the most important prop- 
erties of continuous functions: their preservation of sequential limits. That 
is, if fis continuous at a and if {x,} is a sequence for which lim x, = a,then 


it follows that lim f(x) = f i x; | = f(a). We shall see him exploit this 
principle shortly. 


He then considered “derived functions.” For Cauchy, the differential 
quotient was defined as 


Ay _ f(xti)- fa) 
Ax i , 


where i is infinitely small. Taking his notation from Lagrange, Cauchy 
denoted the derivative by y’ or f’(x) and claimed that this was “easy” to 
determine for simple functions like 


y=rtx, m, r/x, x", A*, log, x, sin x, cos x, arcsin x, and arccos x. 
We shall examine just one of these: y = log, x, the logarithm to base A > 1, 


which Cauchy denoted by L(x) [8] . . Ay _ f(xt+i)— fd _ 
He began with the differential quotient = = 


Ax 1 


cea ee) for i infinitely small and introduced the auxiliary variable 
i 
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a= is which is infinitely small as well. Using rules of logarithms and 


x 
substituting liberally, Cauchy reasoned that 


{= 2 {= 2) 
Ay _ Lx t+i- LO) _ x 5 x 


Ax 1 1 ax 


al 
—L(+ a) 
a 


= &@ = + gy)? (2) 
x x 


1 
For @ infinitely small, he identified this last expression as — L(e). Today we 
x 
would invoke continuity of the logarithm and the fact that lima +a)" =e 
as 
to justify this step. In any case, Cauchy concluded from (2) that the derivative 


1 
of L(x) was —L(e). As a corollary, he noted that the derivative of the natural 
x 


1 1 
logarithm In(x) is —In(e) = —. 
x x 


He obviously had his differential calculus well under control. 


THE INTERMEDIATE VALUE THEOREM 


Cauchy’s analytic reputation rests not only upon his definition of the 
limit. At least as significant was his recognition that the great theorems of 
calculus must be proved from this definition. Whereas earlier mathemati- 
cians had accepted certain results as true because they either conformed to 
intuition or were supported by a diagram, Cauchy seemed unsatisfied 
unless an algebraic argument could be advanced to prove them. He left no 
doubt of his position when he wrote that “it would be a serious error to 
think that one can find certainty only in geometrical demonstrations or in 
the testimony of the senses” [9]. 

His philosophy was evident in a demonstration of the intermediate 
value theorem. This famous result begins with a function f continuous 
between x, and X (Cauchy's preferred designation for the endpoints of an 
interval). If f(xg) < 0 and f(X) > 0, the intermediate value theorem asserts 
that the function must equal zero at one or more points between x, and X. 

For those who trust their eyes, nothing could be more obvious. An 
object moving continuously from a negative to a positive value must 
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somewhere slice across the x-axis. As indicated in figure 6.1, the intermedi- 
ate value occurs at x = a, where f(a) = 0. It is tempting to ask, “What's the 
big deal?” 

Of course, the big deal is that mathematicians hoped to free analysis 
from the danger of intuition and the allure of geometry. For Cauchy, even 
obvious things had to be proved with indisputable logic. 

In that spirit, he began his proof of the intermediate value theorem by 
letting h = X — x, and fixing a whole number m > 1 [10]. He then broke the 
interval from x, to X into m equal subintervals at the points x9, X9 + h/m, 
Xg + 2h/m,..., X—h/m, X and considered the related sequence of func- 
tional values: 


f Xo) f %o + him), f > + 2h/m), ...., fA — him), FOO. 


Because the first of these was negative and the last positive, he observed 
that, as we progress from left to right, we will find two consecutive func- 
tional values with opposite signs. More precisely, for some whole number 
n, we have 


f@ +nh/m) sO but f(%)+(nt+ Dh/m) 20. 


We follow Cauchy in denoting these consecutive points of subdivision by 
Xo t nh/m =x, and Xp + (n + Lh/m =X). Clearly, xp S$ x; < X, <X, and the 
length of the interval from x, to X, is h/m. 

He now repeated the procedure across the smaller interval from x, to 
X,. That is, he divided it into m equal subintervals, each of length h/m?, 
and considered the sequence of functional values 


fOQ), FG i) Gg tt 2h) ee FOG — i  ). 


y=tx) 


Figure 6.1 
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Again, the leftmost value is less than or equal to zero, whereas the right- 
most is greater than or equal to zero, so there must be consecutive points 
x, and X, a distance of h/m? units apart, for which f(x) < 0 and f(X,) 2 0. 
At this stage, we have xX) $x; $x, < X, SX, <X. Those familiar with the 
bisection method for approximating solutions to equations should feel 
perfectly at home with Cauchy’s procedure. 

Continuing in this manner, he generated a nondecreasing sequence 
Xo SX, SX, Sx,S---and a nonincreasing sequence---<X,<X,S 
X, <X, where all the values f(x) < 0 and f(X,) 2 0 and for which the gap 
X, — Xp, = h/m". For increasing k, this gap obviously decreases toward zero, 
and from this Cauchy concluded that the ascending and descending 
sequences must converge to a common limit a. In other words, there is a 
point a for which lim x, = a = lim X,. 

k— eo k— oo 

We pause to comment on this last step. Cauchy here assumed a ver- 
sion of what we now call the completeness property of the real numbers. 
He took it for granted that, because the terms of the sequences {x;,} and 
{X,,} grow arbitrarily close to one another, they must converge to a com- 
mon limit. One could argue that his belief in the existence of this point a 
is as much a result of unexamined intuition as simply believing the inter- 
mediate value theorem in the first place. But such a judgment may be 
overly harsh. Even if Cauchy invoked an untested hypothesis, he had at 
least pushed the argument much deeper toward the core principles. If he 
failed to clear the path of all obstacles, he got rid of most of the brush 
underfoot. 

To finish the argument, Cauchy stated (without proof) that the point 
a falls within the original interval from x, to X, and then he used the con- 
tinuity of f to conclude, in modern notation, that 


fl@= j[ im x,| a lim fx) $0 and 
f@= j[ im x, | = tina f(X,) 2 0. 


In Cauchy’s words, these inequalities established that “the quantity 
f(a)... cannot differ from zero.” He had thus proved the existence of a 
number a between x and X for which f(a) = 0. The general version of the 
intermediate value theorem, namely that a continuous function takes all 
values between f (xo) and f(X), follows as an easy corollary. 

This was a remarkable achievement. Cauchy had, for the most part, 
succeeded in demonstrating a “self-evident” principle by analytic methods. 
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As Judith Grabiner observed, “though the mechanics of the proof are sim- 
ple, the basic conception of the proof is revolutionary. Cauchy trans- 
formed the approximation technique into something entirely different: a 
proof of the existence of a limit” [11]. 


THE MEAN VALUE THEOREM 


We now turn to another staple of the calculus, the mean value theo- 
rem for derivatives [12]. In his Calcul infinitésimal, Cauchy began with a 
preliminary result. 


Lemma: If, for a function f continuous between x, and X, one lets A be the 
smallest and B be the largest value that f’ takes on this interval, then 


fee AIO og, 
X— xX, 

Proof: We note that Cauchy’s reference to f’—and thus his unstated 
assumption that f is differentiable—would of course guarantee the 
continuity of f. Moreover, he assumed outright that the derivative 
takes a greatest and least value on the interval [x), X]. A modern 
approach would treat these hypotheses with more care. 

If his statement seems peculiar, his proof began with a now- 
familiar ring, for Cauchy introduced two “very small numbers” 6 and 
€. These were chosen so that, for all positive values of i < 6 and for any 
x between x, and X, we have 


f(x +i)—- f(x) 
fiod-e< 5 


< f’'(x) +8. (3) 


Here Cauchy was assuming a uniformity condition for his choice 
of 6. The existence of the derivative certainly means that, for any ¢ > 0 
and for any fixed x, there is a 6>0 for which the inequalities of (3) 
hold. But such a 6 depends on both ¢€ and the particular point x. 
Without additional results or assumptions, Cauchy could not justify 
the choice of a single 6 that simultaneously works for all x throughout 
the interval. 

Be that as it may, he next subdivided the interval by choosing 
points 


X99 <X1 <X_ <0 <X,_| <X, 
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where x; —X9, X. -X,,..., X—X,_, “have numerical values less 
than 6.” For these subdivisions, repeated applications of (3) and the 
fact that A < f’(x) $ B imply that 


A-e< f(x) -E< fOa) = fO%0) 


xf te) Pea BES, 


Xx; — Xo 
A-e< f(x,)-e< JO) = fOa) < f(x +e< Bre, 
X27 Xy 


LOO = fin) 


A-eée< f(x, ,)-€ 
[econ aes 


ae i ee ee eS 


Cauchy then observed that, “if one divides the sum of these 
numerators by the sum of these denominators, one obtains a mean 
fraction which is... contained between the limits A — € and B+ e.” 
Here he was using the fact that, if b,>0 fork=1, 2,...,n and if 


C<“kep for all k, then C < da >>; < D as well. Applying 
k k=1 k=1 
this result to the inequalities above, he found that 


ge < LOn) = flor) + fa) = fmt + fOO= ft 
(Ri = Mg ey — ae = A) 


< LX) = fo) 


A <Bte, 


which telescoped to A - € < B+ €. Cauchy ended the 


X— Xo 
proof with the statement that, “as this conclusion holds however small 
be the number ¢, one can affirm that the expression [fai 
~Xo J 
will be bounded between A and B.” Q.E.D. 


This is an interesting argument, one that stumbles over the issue of 
uniformity yet demonstrates a genius in working with inequalities and 
employing the now-ubiquitous € and 6 to reach its desired conclusion. No 
one would confuse this level of generality and rigor with something from 
the early days of Newton and Leibniz. 

Cauchy then used the lemma to prove his mean value theorem. 
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Theorem: If the function f and its derivative f’ are continuous between x5 
and X, then for some @ between 0 and 1, we have 


LOO = JOO) _ 5 4 OK — xp). 
X- Xo 

Proof: The assumed continuity of f’ guarantees, by the general version of 
the intermediate value theorem, that f’ must take any value between its 
least (A) and its greatest (B). But according to the lemma, the number 
fX)— f(%o) 

X- Xo 

“there exists between the limits 0 and 1 a value of @ sufficient to satisfy 
the equation 


is one such intermediate value, and so, as Cauchy put it, 


= — =f big Ox =a) (4) 
0 
Q.E.D. 


The conclusion in (4) differs from what we find in a modern textbook 
only in the notational convention that replaces Cauchy's xy + @(X — X9) by 
our c, where of course 0 < 6< 1 implies x) <c <X. 

So, this is the mean value theorem for derivatives, albeit proved under 
Cauchy’s assumption that the derivative is continuous, an assumption 
made to guarantee that f’ takes all intermediate values between A and B. In 
fact, this assumption is unnecessary, and modern proofs of the mean value 
theorem get along quite nicely without it. Moreover, it turns out that 
derivatives take intermediate values whether or not they are continuous, a 
striking result we shall prove in chapter 10. 

In the 1820s, these finer points were unclear, and Cauchy’ insight, 
significant for its time, would not be the final word. Nevertheless, he had 
identified the mean value theorem as central to a rigorous development of 
the calculus, a position it retains to this day. 


INTEGRALS AND THE FUNDAMENTAL THEOREM OF CALCULUS 


Like Cauchy’s approach to limits, his definition of the integral would 
reverberate through the history of calculus. We recall that Leibniz had 
defined the integral as a sum of infinitely many infinitesimal summands 


and chose the notation Jto suggest this. Strange as it may seem, by 1800 
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integration was no longer perceived in this light. Rather, it had come to 
be regarded primarily as the inverse of differentiation, occupying a sec- 
ondary position in the pantheon of mathematical concepts. Euler, for 
instance, began his influential three-volume text on integral calculus with 
the following: 


Definition: Integral calculus is the method of finding, from a 
given differential, the quantity itself; and the operation which pro- 
duces this is generally called integration. [13] 


Euler thought of integration as dependent upon, and hence subservient 
to, differentiation. 

Cauchy disagreed. He believed the integral must have an independent 
existence and defined it accordingly. He thereby initiated a transformation 
that, as the nineteenth century wore on, would catapult integration into 
the analytic spotlight. 

He began with a function f continuous on the interval between x 
and X [14]. Although continuity was critical to his definition, Cauchy 
pointedly did not assume that f was the derivative of some other func- 
tion. He subdivided the interval into what he called “elements” x, — Xo, 
5 — Xiy Xe — Mayes sp hy arid let 


S= (x, = Xo) fe) a % = SFG) a ws a i) FS) 
Stee ot, See | 


We recognize this as a sum of left-hand rectangular areas, but in his Calcul 
infinitésimal, Cauchy made no mention of the geometry of the situation 
nor did he provide the now-customary diagram. He did, however, observe 
that “the quantity S clearly depends on: (1) the number n of elements into 
which we have divided the difference X — x9; (2) the values of these ele- 
ments and, as a consequence, the mode of division adopted.” Further, he 
claimed that “it is important to note that, if the numerical values of the 
elements differ very little and the number n is quite large, then the manner 
of division will have an imperceptible effect on the value of S.” 

Cauchy gave an argument in support of this last assertion, one that 
assumed uniform continuity—‘“one 6 fits all”’—without recognizing it. In 
this way, he believed he had proved the following result: 


If we decrease indefinitely the numerical values of these elements 
[that is, of x) — X9, Xz —X 1, X3—Xo,--.,X—X,_,] while augment- 
ing their number, the value of S... ends by attaining a certain 
limit that depends uniquely on the form of the function f(x) and 
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the extreme values x, and X attained by the variable x. This limit 
is what we call a definite integral. 


He followed Joseph Fourier (1768-1830) in adopting f. f(x)dx as “the 
most simple” notation for the limit in question. 

Cauchy’s definition was far from perfect, in large measure because it 
applied only to continuous functions. Still, it was a highly significant 
development that left no doubt about two critical points: (1) the integral 
was a limit and (2) its existence had nothing to do with antidifferentiation. 

As was his custom, Cauchy used the definition to prove basic 
results. Some were general rules, such as the fact that the integral of the 
sum is the sum of the integrals. Others were specific formulas like 


x X*— x5 px dx xX 
J SiS 706 J — In (| And Cauchy established that, 
0 Xo 0 
for f continuous, there exists a value of @ between 0 and 1 for which 
x 
J fd = (K - xp)flxq + O&K — x0)]. (5) 


Readers will recognize this as the mean value theorem for integrals. 

Only then, having come this far without even mentioning derivatives, 
was Cauchy ready to bind together the great ideas of differentiation and 
integration. The unifying result is what we call the fundamental theorem 
of calculus. As one of the great theorems in all of mathematics, proved by 
one of the great analysts of all time, it surely deserves our attention [15]. 

As usual, Cauchy began with a continuous function f, but this time, in 
considering its integral, he let the upper limit of integration vary. That is, 


he defined the function ®(x) = [ f(x)dx, although in the interest of 
clarity we now would write ®(x) = J f(t)dt. Cauchy argued that 
xX+Q x 
Ox +a) — (x) =f fOoddx— J" fdax 
x xX+Q x 
= [> fidde+ [ fodde - f° fax 
Xo x Xo 
xX+a 
= I fedx. 


Moreover, by (5), there exists 6 between 0 and 1 for which 
JO" Foddx = (x + =) fle + Oe + a= = a f(x +80. 


In short, P(x + a) — B(x) = af(x + 6a) for some value of 6. 
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To Cauchy, this last equation showed that ® was continuous because 
an infinitely small increase in x produces an infinitely small increase in ®. 
Or, as we might put it, 


lim[®(x + a) — ®(x)] = lima f(x + 6a) = lima - lim f(x + 0a) 
a0 a0 a0 a0 


= lim o - f(lim[x + @0r]) =0- f(x) =0, 


where the continuity of f at x implies tion f(x + 00) = f(x). Consequently, 
firms Pt + a) = ®(x) and so © is continuous at x. 
as 


But Cauchy was after bigger game, for it also followed that 


7) 
ieee a f(x + 0a) 
a0 a 


0’(x) = lim 


a0 


= lim f(x + pO) = 7). 


[ae +o = ne 


a 


Just to be sure no one missed the point, Cauchy rephrased this as 
£ pode = fx, ©) 
dx 2X 


This is the “first version” of the fundamental theorem of calculus. In equa- 
tion (6), the inverse nature of differentiation and integration jumps right 
off the page. 

Having differentiated the integral, Cauchy next showed how to inte- 
grate the derivative. He began with a simple but important result that he 
called a “problem.” 


Problem: If @ is a function whose derivative is everywhere zero, then @ is 
constant. 


Proof: We fix x9 in the function’s domain. If x is another point in the 
domain, the mean value theorem (4) guarantees a 6 between 0 and 1 
such that 


(x) = @(Xo) _ O'[X9 + AX — Xo)] = 0, 
x — Xo 
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and so w(x) = w(x9). Cauchy continued, “If one designates by c the 
constant quantity @(x,), then @(x) =c” for all x. In short, @ is con- 
stant as required. Q.E.D. 


He was now ready for the second version of the fundamental theorem. 
Cauchy assumed that f is continuous and that F is a function with F’(x) = 


fx) for all x. If ®(x) = J” f(sddx, he knew from (6) that (x) =/(). 
Letting w(x) = ®(x) — F(x), Cauchy reasoned that 
o'(x) = ®’(x) — F(x) = fx) — fa) =0. 


Thus there is a constant c with c = w(x) = ®(x) — F(x). He substituted 
X =X, into this last equation to get 


¢ = ®(x,) — F(x) = J. f(oddx — F(xq) = 0 = Fxg) = -F(xy). 


It follows that I. f(x)dx = O(x) = F(x) + ¢ = F(x) — F(x). After chang- 
ing the upper limit of integration to X, Cauchy had what he wanted: 


[> fodde = FOO = Fx). © 


(16) Ha) =f flx)de=F(e) +0(2). 


Si, de plus, les fonctions f(a) et F(a) sont l’une et l’autre continues 
entre les limites 2 = a, x =X, la fonction #(a) sera elle-méme con- 
tinue, et par suite o(a) = #(a) — F(x) conservera constamment la 
méme valeur entre ces limites, entre lesquelles on aura 


@(r)= (2), 
§(#) — F(x) =F(2))—F(a)=—F(2), F(v) = F(x) —F(a), 


(17) f fa)de =F (2) —F (2). 


Enfin, si dans l’équation (17) on pose x = X, on trouvera 


x 
(18) ff fe) de = F(X) F(a). 


Cauchy’s proof of the fundamental theorem of calculus (1823) 
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To see the inverse relationship, we need only replace f(x) by F’(x) and 
write (7) as in F’(x)dx = F(X) — F(x). This version of the fundamental 


theorem integrates the derivative, thereby complementing its predecessor. 

So, when integrating a continuous function f across the interval from 
Xg to X, we can short-circuit Cauchy's intricate definition with its “ele- 
ments” and sums and limits provided we find an antiderivative F. In this 
happy circumstance, evaluating the integral becomes nothing more than 
substituting x, and X into F. One could argue that (7) represents the great- 
est shortcut in all of mathematics. 

Although the fundamental theorem is a fitting capstone to any rigorous 
development of calculus, we end this chapter in yet another corner of analy- 
sis where Cauchy made a significant impact: the realm of infinite series. 


Two CONVERGENCE TESTS 


Like Newton, Leibniz, and Euler before him, Cauchy was a master of 
infinite series. But unlike these predecessors, he recognized the need to 
treat questions of convergence/divergence with care, lest divergent series 
lead mathematicians astray. If Cauchy held such a position, it seemed 
incumbent upon him to supply tests for convergence, and on this front he 
did not disappoint. 

First we must say a word about Cauchys definition of the sum of an 
infinite series. Earlier mathematicians, who could be amazingly clever in 
evaluating specific series, tended to treat these holistically, as single expres- 
sions that behaved more or less like their finite counterparts. To Cauchy, the 


meaning of > u; was more subtle. It required a precise definition in order 
k=0 
to determine not only its value but its very existence. 
His approach is now familiar. Cauchy introduced the sequence of 
partial sums 


S| = Ug, Sp = Ug + Uy, S3 = Ug +U, + Uy, and generally S, = >. 


Then the value of the infinite series was defined to be the limit of this 


co n-l 


sequence, that is, u, = lim S, = = lim u,, provided the limit exists, 
q k ko P 


noo 
k=0 


in which case “the series will be iets corre and the limit . . . will be 


CAUCHY 9| 


called the sum of the series” [16]. As he had done with derivatives and 
integrals, Cauchy erected a theory of infinite series upon the bedrock of 
limits. 

It was an ingenious idea, although in the process Cauchy committed 
an error of omission. From time to time, he asserted the existence of the 
limit of a sequence of partial sums based on the fact that the partial sums 
grew ever closer to one another. By this last statement he meant that, for 
any € > 0, there is an index N so that the difference between S,, and Sy,,, is 
less than ¢ for all k= 1. In his honor, we now call a sequence with this 
property a “Cauchy sequence.” 

However, he offered no justification for the idea that terms growing 
arbitrarily close to one another must necessarily converge to some limit. As 
noted above, this condition is an alternative version of the completeness 
property, the logical foundation upon which the theory of limits, and hence 
the theory of calculus, now rests. To modern mathematicians, complete- 
ness must be addressed either by deriving it from a more elementary defi- 
nition of the real numbers or by adopting it as an axiom. One could argue 
that Cauchy more or less did the latter, although there is a difference 
between assuming something explicitly (as an axiom) and assuming it 
implicitly (as a gaffe). 

In any case, he treated as self-evident the fact that a Cauchy sequence 
is convergent. There is an irony here, for we now attach his name to a con- 
cept he did not fully comprehend. But rather than diminish his status, this 
irony reinforces our previous observation that difficult ideas take time to 
reach maturity. 

With that prologue, we now consider a pair of tests with which 
Cauchy demonstrated the convergence of infinite series. Both proofs are 
based on the comparison test for a series of nonnegative terms, which says 


that if OS a, <b, for all k and if YD, converges, then so does Ya, 
k=0 k=0 
Today the comparison test is proved by means of the aforementioned 


completeness property, and it remains one of the easiest ways to establish 
series convergence. 
The first of our results, the root test, he stated in the following words. 


Theorem: For the infinite series ug + u, tu, +u3;+---+u,+---, find the 


limit or limits to which the expression |u, [= kilu,| converges and let 
A be the greatest of these. Then the series converges if A<1 and 
diverges ifA> 1 [17]. 
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Before proceeding, we should clarify a few points. For one, Cauchy 
did not use the absolute value notation, as we have. Rather, he talked 
about p, as the “numerical value” or the “modulus” of u, and framed the 
root test in terms of p,. Of course, this is just a symbolic convention, not a 
substantive difference. 

Perhaps less familiar is his reference to the A as the “greatest” of 
the limits. Again, we now have a term for this, the limit supremum, and we 


write A=limsup lu," or 2 =lim|u,//" in place of Cauchy’s verbal 


description. 
For readers unfamiliar with the concept, an example may be useful. 


1 1 1 
Suppose we consider the infinite series 4 =]+—+-4+—+—+ 
0 a: 4 27.16 


1 1 1 
+—+ + 
243 64 2187 
with those of certain powers of 2. We see that the series terms ug, Uy, U5, 
U3, .. . obey the pattern: 


-++, where reciprocals of certain powers of 3 alternate 


1 
— fork =0,1,2,..., 


Ud, = 
52k 


Makel  32keT te) ae 20g ee ere 


If we look only at terms with even subscripts, we find the limit of their 


1 
roots to be lim 21/27" = =i whereas if we restrict ourselves to terms 
—oo 


1 
with odd subscripts, we have Me 2k] /3°M1 = = In modern parlance, 


— oo 


] 
the sequence {|u,|!/"} has a subsequence converging to — and another 


l 
converging to a In this case, the greater is A = = 


Cauchy’s proof of the root test in Calcul infinitésimal is virtually 
identical to that found in a modern text. He began with the case 
where 0 <A<1 and fixed a number uw so that A<p <1. His critical 
observation was that the “greatest values” of |u,|!/" “cannot approach 
indefinitely the limit A without eventually becoming less than .” As a 
consequence, he knew there was an integer m such that, for all k =m, 
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we have |u,|!/" < uw and so |u,| < “". He then considered the two infinite 
series 
ste: Sum™tum™migt merge, 


tl Hig) ele 


m+ m+. 


where the geometric series on the right converges because ft < 1. From the 


comparison test, Cauchy deduced the convergence of y lu; |, and thus of 
k=0 


co 


by u, as well. In short, if A < 1, the series converges. It follows, for instance, 

k=0 

1 oil 1 1 1 1 
+—+——+ 


1 
that the series 1+-—+—+ + coe 
4 27 16 243 64 2187 


because A = 1/2. 
His proof of the divergence case (A > 1) was analogous. To demonstrate 
the importance of the root test, Cauchy applied it to determine what we 


Ss LO, 


* converges 


now call the radius of convergence of the Maclaurin series 
k=0 
and from there a rigorous theory of power series was on its way. 


There are other tests of convergence scattered through Cauchy’ 
collected works, such as the ratio test (credited to d’Alembert) and the 


Cauchy condensation test [18]. The latter begins with a series ey u,, where 
k=0 

Up 2U, Zu, 2--- 20 is anonincreasing sequence of positive terms. Cauchy 

proved that the original series and the “condensed” series Uy + 2u, + 4U3 + 


k 
Buz tes +2 U,_, +°** converge or diverge together. In this case, select- 


ed multiples of a subcollection of terms tell us all we need to know 
about the behavior of the original infinite series. It seems too good to be 
true. 

We conclude this section with a lesser known convergence test from 
Cauchy’s arsenal, one that demonstrates his endless fascination with this 
topic [19]. 


co 


I 
Theorem: If x U, is a series of positive terms for which lim maT ; = 


k=l 
then the series converges. 


Proof: As with the root test, Cauchy sought a “buffer” between 1 and h and 
so chose a real number a with 1 < a<h. This guaranteed the existence 
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l 
of a positive integer m so that “ > a for all k =m. From there, he 
observed that ee 
In(u,) _ —In(u,) 


andso_ aln(k)< {+} 
Uy 


R 


In(/k)—s Ink 


1 
Exponentiating both sides of this inequality, he deduced that k* < — 
Up 


1 <1 

and so u, < — forall k 2m. But > —, (which is now called a p-series) 
k k=m - 

converges because a > 1, and so the original series »y Up, converges by 
k=l 

the comparison test. Q.E.D. 


= In(k 
As an example, consider > — where p > 1. Cauchy’s test requires us 
k=l 
In[In(kyVk? | 


to evaluate lim “Ind/) which suggests in turn that we first simplify the 


quotient: 


InfIn(kVk?] _ In[InCk)] — pIn(k) ae In{In(k)] 


In(1/k) —In(k) In(k) 
In[In(k 
By l'Hospital’s rule, lim [- arian + r = p > 1, establishing the conver- 
oo n 


gence of »y ~~ by Cauchy’ test. It is a very nice result. 
k=l 

Before leaving Augustin-Louis Cauchy, we offer an apology and a pre- 
view. We apologize for a chapter that reads like a précis of an introductory 
analysis text. Indeed, there is no stronger testimonial to Cauchy’s influ- 
ence than that his “greatest hits” are now the heart and soul of the subject. 
Building upon the idea of limit, he developed elementary real analysis in a 
way that remains the model to this day. As Bell properly observed, Cauchy 
stands at center stage, and it is for this reason that the present chapter is 
one of the book’s longest. It could hardly be otherwise. 

This brings us to the preview. None of these accolades should sug- 
gest that, after Cauchy, the quest was finished. On at least three fronts 
there was still work to be done, work that will occupy us in chapters to 
come. 
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First, his definitions could be made more general and his proofs more 
rigorous. A satisfactory definition of the integral, for instance, need not be 
limited to continuous functions, and the nagging issue of uniformity had 
to be identified and resolved. These tasks would fall largely to the German 
mathematicians Georg Friedrich Bernhard Riemann and Karl Weierstrass, 
who in a sense supplied the last word on mathematical precision. 

Second, Cauchy’s more theoretical approach to continuity, differen- 
tiability, and integrability motivated those who followed to sort out the 
connections among these concepts. Such connections would intrigue 
mathematicians throughout the nineteenth century, and their resulting 
theorems—and counterexamples—would hold plenty of surprises. 

Finally, the need to understand the completeness property raised 
questions about the very nature of the real numbers. The answers to these 
questions, combined with the arrival of set theory, would change the face 
of analysis, although no mathematician active in 1840 could know that a 
revolution lay just over the horizon. 

But any mathematician active in 1840 would have known about 
Cauchy. On this front, we shall give the last word to math historian Carl 
Boyer. In his classic study of the history of calculus, Boyer wrote, “Through 
[his] works, Cauchy did more than anyone else to impress upon the sub- 
ject the character which it bears at the present time” [20]. 

Ina very real sense, all who followed are his disciples. 


CHAPTER 7 


t 


Riemann 


Georg Friedrich Bernhard Riemann 


B, this point of our story, the “function” had assumed a central 
importance in analysis. At first it may have seemed like a straightforward, 
even innocuous notion, but as the collection of functions grew ever more 
sophisticated—and ever more strange—mathematicians realized they had 
a conceptual tiger by the tail. 

To sketch this evolution, we return briefly to the origins. As we have 
seen, seventeenth century scholars like Newton and Leibniz believed 
that the raw material of their new subject was the curve, a concept 
rooted in the geometric/intuitive approach that later analysts would 
abandon. 

It was largely because of Euler that attention shifted from curves to 
functions. This significant change in viewpoint, dating from the publica- 
tion of his Introductio in analysin infinitorum, positioned real analysis as the 
study of functions and their behavior. 


96 
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Euler addressed this matter early in the Introductio. He first distinguished 
between a constant quantity (one that “always keeps the same value”) and 
a variable quantity (“one which is not determined or is universal, which 
can take on any value”) and then adopted the following definition: “A 
function of a variable quantity is an analytic expression composed in 
any way whatsoever of the variable quantity and numbers or constant quan- 


tities” [1]. As examples he offered expressions like a + 3z,az + ba’ -—2, 
and c*. 

These ideas were a huge improvement upon the “curve” and repre- 
sented a triumph of algebra over geometry. However, his definition identi- 
fied functions with analytic expressions—which is to say, functions with 
formulas. Such an identification painted mathematicians into some bizarre 
if x20, 


as shown in 
ifx <0, 


x 
corners. For instance, the function f(x) = \*, 


figure 7.1 was considered “discontinuous” not because its graph jumped 
around but because its formula did. Of course, it is perfectly continuous 
by the modern (i.e., Cauchy's) definition. Worse, as Cauchy observed, we 


could express the same function by a single formula g(x) = lx. 

There seemed to be ample reason to adopt a more liberal, and liberat- 
ing, view of what a function could be. Euler himself took a step in this 
direction a few years after providing the definition above. In his 1755 text 
on differential calculus, he wrote 


Those quantities that depend on others... , namely, those that 
undergo a change when others change, are called functions of 


Figure 7.1 
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these quantities. This definition applies rather widely and includes 
all ways in which one quantity can be determined by others. [2] 


It is important to note that this time he made no explicit reference to ana- 
lytic expressions, although in his examples of functions Euler retreated to 
familiar formulas like y = x. 

As the eighteenth century became the nineteenth, functions were 
revisited in the study of real-world problems about vibrating strings and 
dissipating heat. This story has been told repeatedly (see, for instance, [3] 
and [4]), so we note here only that a key figure in the evolving discussion 
was Joseph Fourier. He came to believe that any function defined between 
—aand a (be it the position of a string, or the distribution of heat in a rod, 
or something entirely “arbitrary”) could be expressed as what we now call 
a Fourier series: 


co 


1 nix nx 
x)=—dot d, cos —— + b, sin —— ], 
fla) = Fay [aycos™ +by sin =) 


k=1 


where the coefficients a, and b, are given by 
1 fa 1 fa 
a, =—| flx)cos = dx and b, =—[ f(x sin ax. (1) 
a v-a a da v-4a a 


To insure that his readers were under no illusions about the level of gener- 
ality, Fourier explained that his results applied to “a function completely 
arbitrary, that is to say, a succession of given values, subject or not to a 
common law,” and he went on to describe the values of y = f(x) as suc- 
ceeding one another “in any manner whatever, and each of them is given 
as if it were a single quantity” [5]. 

This statement extended the “late Euler” position that functions could 
take values at will across different points of their domain. On the other 
hand, it was by no means clear that the formulas in (1) always hold. The 
coefficients a, and b, are integrals, but how do we know that integrals of 
general functions even make sense? At least implicitly, Fourier had raised 
the question of the existence of a definite integral, or, in modern terminol- 
ogy, of whether a function is or is not integrable. 

As it turned out, Fourier had badly overstated his case, for not every 
function can be expressed as a Fourier series nor integrated as required by 
(1). Further, in practice he restricted himself, as had Euler before him, to 
examples that were fairly routine and well behaved. If the concept ofa truly 
“arbitrary” function were to catch on, someone would have to exhibit one. 
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DIRICHLET’S FUNCTION 


That somebody was Peter Gustav Lejeune-Dirichlet (1805-1859), a 
gifted mathematician who had studied with Gauss in Germany and 
with Fourier in France. Over his career, Dirichlet contributed to branches 
of mathematics ranging from number theory to analysis to that wonder- 
ful hybrid of the two called, appropriately enough, analytic number 
theory. 

Here we consider only a portion of Dirichlet’s 1829 paper “Sur la con- 
vergence des séries trigonométriques qui servent a représenter une fonction arbi- 
trarie entre des limites données” (On the Convergence of Trigonometric 
Series that Represent an Arbitrary Function between Given Limits) [6]. In 
it, he returned to the representability of functions by a Fourier series like 
(1) and the implicit existence of those integrals determining the coeffi- 
cients. 

We recall that Cauchy defined his integral for functions continuous 
on an interval [@, 8]. Using what we now call “improper integrals,” 
Cauchy extended his idea to functions with finitely many points of dis- 
continuity in [@, 6]. For instance, if f is continuous except at a single 
point r within [a@, B], as shown in Figure 7.2, Cauchy defined the inte- 
gral as 


lim [ fddx, 


tr 


fj foddx = J! foods + [ foddx = lim f fda 


( 


Figure 7.2 
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provided all limits exist. If f has discontinuities at r; <1, <13<---<r 
we define the integral analogously as 


n 


[ feoae=f" foode+ f* foddet J? fodder +f fda 


However, if a function had infinitely many discontinuities in the inter- 
val [a, B], Cauchy’s integral was of no use. Dirichlet suggested that a new, 
more inclusive theory of integration might be crafted to handle such func- 
tions, a theory connected to “the fundamental principles of infinitestimal 
analysis.” He never developed ideas in this direction nor did he show how 
to integrate highly discontinuous functions. He did, however, furnish an 
example to show that such things exist. 

“One supposes,” he wrote, “that @(x) equals a determined constant c 
when the variable x takes a rational value and equals another constant d 
when the variable is irrational” [7]. This is what we now call Dirichlet’s 
function, written concisely as 


(x) = c if x is rational, 
ee d if x is irrational. (2) 


By the Fourier definition, @ was certainly a function: to each x there 
corresponded one y, even if the correspondence arose from no (obvious) 
analytic formula. But the function is impossible to graph because of the 
thorough intermixing of rationals and irrationals on the number line: 
between any two rationals there is an irrational and vice versa. The graph 
of @ would thus jump back and forth between c and d infinitely often as 
we move through any interval, no matter how narrow. Such a thing cannot 
be drawn nor, perhaps, imagined. 

Worse, ¢ has no point of continuity. This follows because of the same 
intermixing of rationals and irrationals. Recall that Cauchy had defined 
continuity of @ at a point x by lial + i) — 6(x)] = 0. As i moves toward 


O, it passes through an infinitude of rational and irrational points. As a 
consequence, (x +i) jumps wildly back and forth, so that the limit in 
question not only fails to be zero but fails even to exist. Because this is the 
case for any x, the function has no point of continuity. 

The significance of this example was twofold. First, it demonstrated 
that Fourier’s idea of an arbitrary function had teeth to it. Before Dirichlet, 
even those who advocated a more general concept of function had not, in 
the words of math historian Thomas Hawkins, “taken the implications of 
this idea seriously” [8]. Dirichlet, by contrast, showed that the world of 
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functions was more vast than anyone had thought. Second, his example 
suggested an inadequacy in Cauchy’ approach to the integral. Perhaps inte- 
gration could be recast so as not to restrict mathematicians to integrating 
continuous functions or those with only finitely many discontinuity points. 

It was Dirichlet’s brilliant student, the abundantly named Georg 
Friedrich Bernhard Riemann (1826-1866), who took up this challenge. 
Riemann sought to define the integral without prior assumptions about 
how continuous a function must be. Divorcing integrability from continu- 
ity was a bold and provocative idea. 


THE RIEMANN INTEGRAL 


In his 1854 Habilitationsschrift, a high-level dissertation required of pro- 
fessors at German universities, Riemann stated the issue simply: “What is 


b 
one to understand by j f(x)dx ?” [9]. Assuming f to be bounded on [a, bl, 


he proceeded with his answer. 

First, he took any sequence of values a<x,<x,<---<x,_)<b 
within the interval [a, b]. Such a subdivision is now called a partition. He 
denoted the lengths of the resulting subintervals by 6, =x, — a, 6, =x, —-%x,, 
63 =X3—X>z, and so on up to 6, =b—.x,_). Riemann next let €,, &,..., €, 
be a sequence of values between 0 and 1; thus, for each ¢,, the number 
X,_-1 + €&,0, lies between x,,+0-6,=x,_,; and x,,+1-6,=x,)+ 
(X;, — Xp_1) =X, In other words, x,_, + €,6; falls within the subinterval 
[x,_1, X,]. He then introduced 


S= 6, flat €,6,) + 3, f(x, + €&6,) + 6, fx, + &6,) 
pete Of FE, 6,). 


non 


The reader will recognize this as what we now (appropriately) call a Rie- 
mann sum. As illustrated in figure 7.3, it is the total of the areas of rectangles 
standing upon the various subintervals, where the kth rectangle has base 6, 
and height f(x,_, + €,6,). 

Riemann was now ready with his critical definition: 


If this sum has the property that, however the 6, and &€, are cho- 
sen, it becomes infinitely close to a fixed value A as the 6, become 


b 
infinitely small, then we call this fixed value J f(x)dx . If the sum 


b 
does not have this property, then [ f(x)dx has no meaning [10]. 
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W(Xk-1 + Ek dk) 


a x4 x2 Xk Xk-1 + EkOk Xk Xn b 
KJ 
5k 
Figure 7.3 


This is the first appearance of the Riemann integral, now featured promi- 
nently in any course in calculus and, most likely, in any introduction to 
real analysis. It is evident that this definition assumed nothing about conti- 
nuity. For Riemann, unlike for Cauchy, continuity was a nonissue. 

Returning to the function f and the partition a< x, <x, <---<x,_, <b, 
Riemann introduced D, as the “greatest oscillation” of the function between 
a and x,. In his words, D, was “the difference between the greatest and 
least values [of f] in this interval.” Similarly, D,, D3, ... , D,, were the great- 
est oscillations of f over the subintervals [x,, x5], [x), x3],..., [x,_1, bl, 
and he let D be the difference between the maximum and minimum val- 
ues of f over the entire interval [a, b]. Clearly D, < D, because f cannot 
oscillate more over a subinterval than it does across all of [a, b]. 

A modern mathematician would define these oscillations with more 
care. Because f is assumed to be bounded, we know from the all-important 
completeness property that the set of real numbers {f(x) lx [x,1, X;,1} 
has both a least upper bound and a greatest lower bound. We then let D, 
be the difference of these. In the mid-nineteenth century, however, this 
approach would not have been feasible, for the concepts of a least upper 
bound and a greatest lower bound—now called, respectively, a supremum 
and an infimum—rested upon vague geometrical intuition if they were 
perceived at all. 
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Be that as it may, Riemann introduced the new sum 
R= 0,D)+0,D; + 0,03, +1946... (3) 


R is the shaded area, determined by the difference between the function’s 
largest and smallest values over each subinterval, shown in figure 7.4. 

He next let d > 0 be a positive number and looked at all partitions of 
la, b] for which max {6,, 65, 63,..., 6,} <d. In words, he was consider- 
ing those partitions for which even the widest subinterval is of length d or 
less. Reverting to modern terminology, we define the norm of a partition to 
be the width of the partition’s biggest subinterval, so Riemann was here 
looking at all partitions with norm less than or equal to d. He then intro- 
duced A = A(d) to be the “greatest value” of all sums R in (3) arising from 
partitions with norm less than or equal to d. (Today we would define A(d) 
as a supremum.) b 

It was clear to Riemann that the integral [ f(x)dx existed if and only 


if lim A(d) = 0. Geometrically, this means that as we take increasingly fine 
0 


partitions of [a, b], the largest shaded area in figure 7.4 will decrease to zero. 

He then posed the critical question, “In which cases does a function 
allow integration and in which does it not?” As before, he was ready with an 
answer—what we now call the Riemann integrability condition—although 
the notational baggage became even heavier. Because of the importance of 
these ideas to the history of analysis, we follow along a little further. 


Figure 7.4 
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First, he let o>0 be a positive number. For a given partition, he 
looked at those subintervals for which the oscillation of the function was 
greater than o. To illustrate, we refer to figure 7.5, where we display the 
function, its shaded rectangles, and a value of o at the left. Comparing o 
to the heights of the rectangles, we see that on only the two subintervals 
[x, x] and [x,, x5] does the oscillation exceed o. We shall call these “Type 
A” subintervals. The others, where the oscillation is less than or equal to 
o, we call “Type B” subintervals. In figure 7.5, the subintervals of Type B 
are [da, X1], xo, x3], [x3, x4], and [xs, bl]. 

As a last convention, Riemann let s = s(o) be the combined length of 
the Type A subintervals for a given o; that is, s(o)= > 6,. For our 

Type A 
example, s(o) = (x, — x) + (x5 — x,). With this notation behind him, Rie- 
mann was now ready to prove a necessary and sufficient condition that a 
bounded function on [a, b] be integrable. 


a x4 xo X3 Xa x5 b 


Figure 7.5 
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b 
Riemann Integrability Condition: ) f(x)dx exists if and only if, for any 
a 


o> 0, the combined length of the Type A subintervals can be made as 
small as we wish by letting d > 0. 


Admittedly, there is a lot going on here. In words, this says that f is inte- 
grable if and only if, for any ono matter how small, we can find a norm so 
that, for all partitions of [a, b] having a norm that small or smaller, the total 
length of the subintervals where the function oscillates more than o is neg- 
ligible. We examine Riemann’ necessity and sufficiency proofs separately. 


b 
Necessity: If [ fodax exists and we fix a value of o>0O, then 
a 
lim s(o) = 0. 
d>0 


Proof: Riemann began with a partition of unspecified norm d and consid- 
ered R= 6,D, + 6,D, + 6;3D,+---+6,D, from (3). He noted that 


R= y 6,,D;,, because the summation on the right includes the Type A 
Type A 
terms and omits the others. But for each Type A subinterval, the oscil- 


lation of f exceeds o; this is, of course, how the Type A subintervals are 
identified in the first place. So, recalling the definition of s(o), we have 


R> }) 5D, 2) 5,0=0- Y 5,=0-s(0). 


Type A Type A Type A 


On the other hand, R= 6,D, + 6,D, + 6;D3,+---+6,D, < A@) because 
A(d) is the greatest such value for all partitions having norm d or less. 
Riemann combined this pair of inequalities to get o- s(o) SRS 


A(d). Ignoring the middle term and dividing by o, he concluded that 


O< caeee. (4) 
Oo 


Recall that, in proving necessity, he had assumed that f is integrable, 
and this in turn meant that A(d) > 0 as d > 0. Because o was a fixed 
A(d) 

oO 
zero, the value of s(o) must likewise go to zero. Q.E.D. 


number, — 0 as well. It follows from (4) that, as d approaches 
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This was the conclusion Riemann sought: that the total length s(o) of 
subintervals where the function oscillates more than o can be made, as he 
wrote, “arbitrarily small with suitable values of d.” That was half the battle. 
Next in line was the converse. 


b 
Sufficiency: If for any o>0, we have lim s(o) = 0, then J f(x)dx 
exists. 


Proof: This time Riemann began by noting that, for any o> 0, we have 


R = 6,D, + 6D, + 6,D,;+---+5,D,= }) 5,D,+ ¥ 5,D,. (5) 
Type A Type B 


Here he simply broke the summation into two parts, depending on whether 
the interval was of Type A (where the function oscillates more than o) or 
of Type B (where it does not). He then treated these summands separately. 

For the first, he recalled that D,, < D, where D was the oscillation of f 
over the entire interval [a, b]. Thus, 


> 5D, < ¥ 5,D=D- >} 6, =D- s(o). (6) 


Type A Type A Type A 


Meanwhile, for each Type B subinterval we know that D,, < o, and so 


y 5,D, <5 ¥ 6,0=0-> 5,8 0- ya =o(b-a), (7) 


Type B Type B Type B 


where we have replaced the sum of the lengths of the Type B subintervals 
with the larger value b — a, the sum of the lengths of all the subintervals. 
Riemann now assembled (5), (6), and (7) to get the inequality 


R= ¥ 6,D,+ }) 6,D, < Ds(o) + o(b - a). (8) 


Type A Type B 


Because (8) holds for any positive o, we can fix a value of oso that o(b— a) 
is as small as we wish. For this fixed value of o, we recall the hypothesis 
that as d > O, then s(o) goes to zero as well. We thus can choose d so that 
Ds(o) is also small. From (8) it follows that the corresponding values of R 
can be made arbitrarily small, and so the greatest of these—what 
Riemann called A(d)—will likewise be arbitrarily small. This meant that 
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lim A(d) = 0, which was Riemann’s way of saying that f is integrable 
> 
on [a, b]. Q.E.D. 


This complicated argument has been taken intact from Riemann’ 
1854 paper. Although notationally intricate, the fundamental idea is sim- 
ple: in order for a function to have a Riemann integral, its oscillations 
must be under control. A function that jumps too often and too wildly 
cannot be integrated. From a geometrical viewpoint, such a function 
would seem to have no definable area beneath it. 

The Riemann integrability condition is a handy device for showing 
when a bounded function is or is not integrable. Consider again Dirichlet’s 
function in (2). For the sake of specificity, we take c= 1 and d=0 and 
restrict our attention to the unit interval [0, 1]. Then we have 


(x) -{ 


1 if x is rational, 
O if x is irrational. 


The question is whether, by Riemann’s definition, the integral [, ecodx 
exists. 

As we have seen, the integrability condition replaces this question by 
one involving oscillations of the function. Suppose we let o= 1/2 and 
consider any partition 0<x,<x,<---<x,_,;<1 and any resulting 
subinterval [x,, x;,,]. Because this subinterval, no matter how narrow, 
contains infinitely many rationals and infinitely many irrationals, the 
oscillation of @ on [x,, X,4,;] is l-O=1>1/2= 0. As a consequence, 


every subinterval of the partition is of Type A, and so s(1/2) = DF 6, =1, 
Type A 
the entire length of [0, 1]. In short, s(1/2) = 1 for any partition of [0, 1]. 
Riemann’s condition required that, for @ to be integrable, s(1/2) = 
by 6, can be made as small as we wish by choosing suitably fine partitions 
Type A 
[0,1]. But as we have seen, the value of s(1/2) is 1 no matter how we tinker 
with the partition, so we surely cannot make it less than, say, 0.01. Because 
the integrability condition cannot be met, this function is not integrable. 


According to Riemann, [, ocod is nonsense. 

Intuitively, Dirichlet’s function is so thoroughly discontinuous that it 
cannot be integrated. This phenomenon raised a fundamental question: 
just how discontinuous can a function be and still be integrable by Rie- 
mann’s definition? Although this mystery would not be solved until the 
twentieth century, Riemann himself described a function that provided a 
tantalizing piece of evidence. 
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RIEMANN’S PATHOLOGICAL FUNCTION 


As noted, Riemann introduced no prior assumptions about continuity 
and thereby suggested that some very bizarre functions—those that “are 
discontinuous infinitely often,” as he put it—might be integrated. “As 
these functions are as yet nowhere considered,” he wrote, “it will be good 
to provide a specific example” [11]. 

First he let (x) =x — n, where n is the integer nearest to x. Thus, (1.2) = 
(—1.8) = 0.2, whereas (1.7) = (- 1.3) =—0.3. Ifx fell halfway between two 
integers, like 4.5 or —0.5, then he set (x) = 0. The graph of y = (x) appears 
in figure 7.6. It is clear that the function has a jump discontinuity of 
length 1 at each x = +m/2, where m is an odd whole number. 

Riemann next considered y = (2x), which “compressed” figure 7.6 
horizontally and resulted in the graph of figure 7.7. Here jumps of length 
1 occur at x =+m/4, where m is an odd whole number. 

This compression process continued with y = (3x), y = (4x), and so 
on, until Riemann assembled these into the function of interest: 


x), Qx) , Gx), SL i yi 


a 
= ee Ge ag 


Figure 7.6 
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Figure 7.7 


To get a sense of f, we have graphed its seventh partial sum, that is, 
) , Gx) Gx), G0) | Gx), (6x) | (1) 
iL 4 9 16 25 36 
in figure 7.8. Even at this a it appears that the discontinuities of f are 
fast accumulating. 
We observe that (kx)| < ee = 5 for all x, and so the infinite series converges 


, over the interval [0, 1] 


everywhere by a comparison test with s = Riemann asserted, without 
k=l 
a complete proof, that f is continuous at those points where each indi- 


vidual function y = (kx) is continuous, and this would include all the 


m 
irrationals. But he also asserted that, if x = Bry, where m and n are relatively 
n 


1 | ee | I 
. ao th h ; t f length 1l+—+ + ae 
prime integers, then f has a jump at x of leng n2 [ 9 25 49 


1 x 
a Saved |- ar (Here we have summed the series using Euler’s result 


from chapter 4.) 
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Figure 7.8 


; ; ; an : it 
Thus, Riemann’s function has discontinuities at points like a 


38 r aaa There are infinitely many such points between any two real 


numbers, and so his function had infinitely many points of discontinuity 
within any finite interval. This should meet anyone's criterion for “highly 
discontinuous.” : 

Nonetheless—and this is the amazing part—|. f(x)dx exists. Riemann 
proved this by means of the integrability condition above. He began with 
an arbitrary o> 0, although to simplify our discussion, we shall specify 


1 
o= Th We must identify those points where the oscillation of the function 


1 m 
exceeds 5G and these are rationals of the form x = oa But the size of 
n 
2 
the jump at such points is es so we need only consider the inequality 
n 
2 


i 
ao > aT It follows that n< + v10 = 4.967, and because n is a whole 
n 
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number, the only options are n= 1, 2, 3, or 4. When we note as well that 


m 
m and n have no common factors and that 0 < on < 1, we conclude that 
n 


there are only finitely many such candidates. In this case, the points 


Tot i 13 
in [0, 1] where the functi illat th Send hase bara ee Gon 
in [0, 1] where the function oscillates more than 50 8 g'6' 4’ 3'8B 
I a2. 3. oS 7 

? ? ? ? > and ” 

2 ee 38 8 
Because we have only finitely many points to deal with, we can create 
a partition of [0, 1] that places each of these within a very narrow subin- 
terval, the total length of which can be as small as we wish. For instance, 
to include the eleven points above in subintervals with total length less 


than 1/100, we might begin our partition with 


1 1 1249 1 1 1251 
0<%,= = <x,=—+ = ; 
8 10000 10000 8 10000 10000 


1 
thereby embedding the discontinuity at x = rs in a subinterval of total 


length 6, = eS If we put equally narrow intervals 


10000 10000 5000 


1 1 1 
bout each of the Type A points f = —,th —}]=11x}| — 
a e eacn O e ype points or Oo 30° en (| | 


<—_. 
100 

The critical issue here is the finite number of points where the oscilla- 
tion exceeds a given o. Riemann summarized the situation as follows: “In 
all intervals which do not contain these jumps, the oscillations are less 
than o and... the total length of the intervals that contain these jumps 
can, at our pleasure, be made small” [12]. 

Riemann had constructed a function with infinitely many disconti- 
nuities in any interval yet that met his integrability condition. It was a 
peculiar creation, one that is now known as Riemann’ pathological 
function, where the adjective carries the connotation of being, in some 
sense, “sick.” 

Of course, Riemann had not answered the question, “How discontin- 
uous can an integrable function be?” But he had shown that integrable 
functions could be stunningly discontinuous. To those critics who sneered 
that an example as weird as Riemann’s was of no practical use, he offered a 
persuasive rejoinder: “This topic stands in the closest association with the 
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principles of infinitesimal analysis and can serve to bring to these principles 
greater clarity and precision. In this respect, the topic has an immediate 
interest” [13]. Riemann’ pathological function had precisely this effect, 
even if it did provide a blow to the mathematical intuition. As we shall see, 
more intuition-busters were in store for analysts of the nineteenth century. 


THE RIEMANN REARRANGEMENT THEOREM 


To be sure, Riemann is best known for his theory of the integral, but 
we end this chapter in a different corner of analysis, with a Riemannian 
result that may be less important than whimsical, but one that never ceases 
to amaze the first-time student. 

We begin by recalling the Leibniz series from chapter 2, namely, 


1 1 161 
1 5 + —s + oor Suppose we rearrange the terms of this series in 
the following manner: take the first two positive terms followed by the first 
negative; take the next two positive terms followed by the second negative; 


and so on. After grouping this rearrangement into threesomes, we have 


Aye, 2 I b. 3 ee oe 
1+4+=-=]4+/=+—-=]+ +—- + +—- + 
5 ;| E 13 *) (3 21 =| (5 29 5] 


(9) 


A moment’ thought reveals that the expressions in parentheses look like 


1 1 1 


+ for R=1,2, 3,4) 2.5, 
8k-7 8k-3 4k-1 


24k -11 


(8k — 7)(8k — 3)(4k - 1D) 

Because k => 1, both the numerator and denominator of this last frac- 
tion must be positive, and so the value of each threesome in (9) will be 
positive as well. We thus can say the following about the rearranged series: 


i i\,ft,2 4 1 11 ii 4d 
1+ +] =+ + + + + + 
5 1 : 13 1) [2 21 *) [ 29 x) 


1 
| 1 2 lope Geese SO Rede... 
5 3 15 


and these can be combined into 
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On the other hand, Leibniz had proved that the original series 


1 ! + 11 fs ee ere 0.7854. We are left with an inescapable 
3 5 7 9 4 


conclusion: the rearranged series, whose sum has been shown to exceed 
0.8666, cannot converge to the same number as the original. By altering 
not the terms of the series but their position, we have changed the sum. 
This seems mighty odd. 

Actually, it gets worse, for Riemann showed how the Leibniz series 
can be rearranged to converge to any number at all! 

His reasoning is expedited by the introduction of some terminology 
and a few well-known theorems. As we saw, it was Cauchy who said what 


it means for an infinite series by to converge. A general series may, of 
k=l 
course, include both positive and negative terms, and this suggests that 


we disregard the signs and look at Y lu, | instead. If this latter series 
k=l 


co co 


converges, we say that yi converges absolutely. If 4 converges but 
k=l k=l 


co 


by lu,| does not, the original series is said to converge conditionally. 
k=l 


As an example, we return to the original series of Leibniz. It sums to 


a but the related series of absolute values diverges because 
Ie, . al 1 1 1 1 1 
lt+i4+lt+itit---2-4+—4+54+-4+—4--: 
a 3 7 9 2 4 6 8 10 


where we recognize the divergent harmonic series in the brackets. This 
means that Leibniz’s series is conditionally convergent. 

It is customary when dealing with series of mixed signs to consider 
the positives and the negatives separately. Following Riemann’ notation, 
we Wile a Series aS (dy + day Fay do) Pt By Sy — Dy — by = * +2), 
where all the a, and b, are nonnegative. Riemann knew that if the original 


series converged absolutely, then both of the series Yah and Yb, 
k=l k=l 

converge; if the original series diverged, then one of ya and yh 
k=1 k=1 
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diverges to infinity; and if the original converged conditionally, then both 


S a, and > b,, diverge to infinity. 
k=l k=1 

It was Dirichlet who showed that any rearrangement of an absolutely 
convergent series must converge to the same sum as the original [14]. 
For absolutely convergent series, repositioning the terms has no impact 
whatever. 

But for conditionally convergent series, we reach a dramatically differ- 
ent conclusion: if a series converges conditionally, it can be rearranged to 
converge to whatever number we wish. With some alliterative excess, we 
might call this Riemann’ remarkable rearrangement result. Here is the 
idea of his proof. 

Letting C be a fixed number—our “target,” so to speak—Riemann 
began thus: “One alternately takes sufficiently many positive terms of the 
series that their sum exceeds C and then sufficiently many negative terms 
that the (combined) sum is less than C” [15]. To see what he was getting 
at, we stipulate that our target C is positive. Starting with the positive 
terms, we find the smallest m so that a, +a, +a,+---+4,,>C. There 


surely is such an index because > 4, diverges to infinity. One next 


k=1 
considers the negative terms and chooses the smallest n so that a, + a, + 
a3+--:+4,,—b,—b,—-—---—b,<C. Again, we know such an index 


exists because the divergent series y b, must eventually exceed (a, + a, + 
k=l 
d3+:+-:+a,)—-C. But a,)+a,+a,+°--+4,—b,—b,—---—b, is a 
rearrangement of terms of the original series whose sum can be no further 
from C than b,. The process is then repeated, adding some a, and sub- 
tracting some b, so that the difference between C and this sum of these 
rearranged terms is less than some b,. Because the original series con- 
verges, we know its general term Bobs to zero, so lim b, = 0 as well. The 


series rearranged by his alternating scheme will converge to C as claimed. 
It is quite wonderful. 

To illustrate, suppose we sought a rearrangement of Leibniz’s series 
that would converge to, say, 1.10. We would begin with sufficiently many 


1 
positive terms to exceed this: 1+ == 1.2 > 1.10. Then we would subtract 


a negative term to bring us below 1.10: 


ieee eee eet 
5 3 
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Then we add back some positive terms until we again surpass 1.10, then 
bounce back with some negatives, and so on. With this recipe, the 
rearranged Leibniz series that converges to 1.10 will begin as follows: 


Lyd. ft, . PV df. dt, td 1 
14=]-=+]/—+—+ -=+ +—+—+ ——+ 
;| 3 E 13 =] 7 (3 25. 29 =) 11 


Once seen, Riemann’s argument seems self-evident. Nonetheless, his 
rearrangement theorem demonstrates in dramatic fashion that summing 
infinite series is a tricky business. By simply rearranging the terms we can 
drastically alter the answer. As has been observed previously, the study of 
infinite processes, which is to say analysis, can carry us into deep waters. 

With that, we leave Georg Friedrich Bernhard Riemann, although no 
journey through nineteenth century analysis can leave him for long. More 
than anyone, he established the integral as a primary player in the calcu- 
lus enterprise. And his ideas would serve as the point of departure for 
Henri Lebesgue, who, as we shall see in the book’ final chapter, picked up 
where Riemann left off to develop his own revolutionary theory of inte- 
gration. 


CHAPTER 8 


tr 


Liouville 


Joseph Liouville 


Generality lies at the heart of modern analysis, a trend already evi- 
dent in the limit theorems of Cauchy or the integrals of Riemann. More 
than their predecessors, these mathematicians defined key concepts inclu- 
sively and drew conclusions valid not for one or two cases but for enor- 
mous families. It was a most significant development. 

Yet the century witnessed another, seemingly opposite, phenomenon: 
the growing importance of the explicit example and the specific counterex- 
ample. These deserve our attention alongside the general theorems of the 
preceding pages. In this chapter, we examine Joseph Liouville’s discovery of 
the first transcendental number in 1851; in the next, we consider Karl 
Weierstrass’s astonishingly pathological function from 1872. Each of these 
was a major achievement of its time, and each reminds us that analysis 
would be incomplete without the clarification provided by individual 
examples. 
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To study transcendentals, we need some background on where the 
problem originated, how it was refined over the decades, and why its res- 
olution was such a grand achievement. We start, as did calculus itself, in 
the seventeenth century. 


THE ALGEBRAIC AND THE TRANSCENDENTAL 


It appears to have been Leibniz who first used the term “transcenden- 
tal” in a mathematical classification scheme. Writing about his newly 
invented differential calculus, Leibniz noted its applicability to fractions, 
roots, and similar algebraic quantities, but then added, “It is clear that our 
method also covers transcendental curves—those that cannot be reduced 
by algebraic computation or have no particular degree—and thus holds in 
a most general way”[1]. Here Leibniz wanted to separate those entities 
that were algebraic, and thus reasonably straightforward, from those that 
were intrinsically more sophisticated. 

The distinction was refined by Euler in the eighteenth century. In his 
Introductio, he listed the so-called algebraic operations as “addition, sub- 
traction, multiplication, division, raising to a power, and extraction of 
roots,” as well as “the solution of equations.” Any other operations were 
transcendental, such as those involving “exponentials, logarithms, and 
others which integral calculus supplies in abundance” [2]. He even went 
so far as to mention transcendental quantities and gave as an example “log- 
arithms of numbers that are not powers of the base,” although he provid- 
ed no airtight definition nor rigorous proof [3]. 

Our mathematical forebears had the right idea, even if they failed to 
express it precisely. To them it was evident that certain mathematical objects, 
be they curves, functions, or numbers, were accessible via the fundamental 
operations of algebra, whereas others were sufficiently complicated to tran- 
scend algebra altogether and thereby earn the name “transcendental.” 

After contributions from such late eighteenth century mathematicians 
as Legendre, an unambiguous definition appeared. A real number was 
said to be algebraic if it solved some polynomial equation with integer 
coefficients. That is, xg is an algebraic number if there exists a polynomial 
Po) Sax + in| + oe ep ont hy, where @ boc... , 8; and h are 
integers and such that P(x) = 0. For instance, 2 is algebraic because it is 
a solution of x* — 2= 0, a quadratic equation with integer coefficients. Less 
obviously, the number 2 + ¥/5 is algebraic for it solves x° — 6x* — 10x? + 
12x* — 60x + 17 =0. 
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From a geometric perspective, an algebraic number is the x-intercept 
of the graph of y = P(x), where P is a polynomial with integer coefficients 
(see figure 8.1). If we imagine graphing on the same axes all linear, all quad- 
ratic, all cubic—generally all polynomials whose coefficients are integers— 
then the infinite collection of their x-intercepts will be the algebraic 
numbers. 

An obvious question arises: Is there anything else? To allow for this 
possibility, we say a real number is transcendental if it is not algebraic. Any 
real number must, by sheer logic, fall into one category or the other. 

But are there any transcendentals? A piece of terminology, after all, 
does not guarantee existence. A mammalogist might just as well define a 
dolphin to be algebraic if it lives in water and to be transcendental if it does 
not. Here, the concept of a transcendental dolphin is unambiguous, but 
no such thing exists. 

Mathematicians had to face a similar possibility. Could transcendental 
numbers be a well-defined figment of the imagination? Might all those 
(algebraic) x-intercepts cover the line completely? If not, where should 
one look for a number that is not the intercept of any polynomial equation 
with integer coefficients? 

As a first step toward an answer, we note that a transcendental num- 
ber must be irrational. For, if xg = a/b is rational, then xg obviously satis- 
fies the first-degree equation bx — a= 0, whose coefficients b and —a are 
integers. Indeed, the rationals are precisely those algebraic numbers satis- 
fying linear equations with integer coefficients. 

Of course, not every algebraic number is rational, as is clear from the 
algebraic irrationals 2 and 2 + 9/5. Algebraic numbers thus represent a 
generalization of the rationals in that we now drop the requirement that 
they solve polynomials of the first degree (although we retain the restric- 
tion that coefficients be integers). 


y=P(x) 


algebraic numbers 
_— e. 


Figure 8.1 


LIOUVILLE 119 


Transcendentals, if they exist, must lurk among the irrationals. From 
the time of the Greeks, roots like 2 were known to be irrational, and by 
the end of the eighteenth century, the irrationality of the constants e and z 
had been established, respectively, by Euler in 1737 and Johann Lambert 
(1728-1777) in 1768 [4]. But proving irrationality is a far easier task than 
proving transcendence. 

As we noted, Euler conjectured that the number log,3 is transcenden- 
tal, and Legendre believed that 2 was as well [5]. However, beliefs of math- 
ematicians, no matter how fervently held, prove nothing. Deep into the 
nineteenth century, the existence of even a single transcendental number 
had yet to be demonstrated. It remained possible that these might occupy 
the same empty niche as those transcendental dolphins. 

An example was provided at long last by the French mathematician 
Joseph Liouville (1809-1882). Modern students may remember his name 
from Sturm-—Liouville theory in differential equations or from Liouville’s 
theorem (“an entire, bounded function is constant”) in complex analysis. 
He contributed significantly to such applied areas as electricity and ther- 
modynamics and, in an entirely different arena, was elected to the Assem- 
bly of France during the tumultuous year of 1848. On top of all of this, for 
thirty-nine years he edited one of the most influential journals in the his- 
tory of mathematics, originally titled Journal de mathématiques pures et 
appliquées but often referred to simply as the Journal de Liouville. In this 
way, he was responsible for transmitting mathematical ideas to colleagues 
around Europe and the world [6]. 

Within real analysis, Liouville is remembered for two significant discov- 
eries. First was his proof that certain elementary functions cannot have ele- 
mentary antiderivatives. Anyone who has taken calculus will remember 
applying clever schemes to find indefinite integrals. Although these matters 
are no longer addressed with quite as much zeal as in the past, calculus 
courses still cover techniques like integration by parts and integration by 


partial fractions that allow us to compute such antiderivatives as J ve d= 
—x?e-* — 2xe* — 2e* +C or the considerably less self-evident 
tanx —y2tanx +1 
tanx + -y2tanx +1 
1 v2 tan x 
+ —=arctan | ———— |+C 
V2 1—tanx 


Note that both the integrands and their antiderivatives are composed of 
functions from the standard Eulerian repertoire: algebraic, trigonometric, 


1 
ie" 


J vianx dx = 
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logarithmic, and their inverses. These are “elementary” integrals with “ele- 
mentary” antiderivatives. 

Alas, even the most diligent integrator will be stymied in his or her quest 
for in sin x dx as a finite combination of simple functions. It was Liouville 
who proved in an 1835 paper why a closed-form answer for certain integrals 
is impossible. For instance, he wrote that, “One easily convinces oneself 


by our method that the integral ax, which has greatly occupied 
8 2 greatly Pp 


geometers, is impossible in finite form” [7]. The hope that easy functions 
must have easy antiderivatives was destroyed forever. 

In this chapter our object is Liouville’s other famous contribution: a 
proof that transcendental numbers exist. His original argument came in 
1844, although he refined and simplified the result in a classic 1851 paper 
(published in his own journal, of course) from which we take the proof 
that follows [8]. Before providing his example of a hitherto unseen tran- 
scendental, Liouville first had to prove an important inequality about irra- 
tional algebraic numbers and their rational neighbors. 


LIOUVILLE’S INEQUALITY 


As noted, a real number is algebraic if it is the solution to some polyno- 
mial equation with integer coefficients. Any number that solves one such 
equation, however, solves infinitely many. For instance, V2 is the solution of 
the quadratic equation x* — 2 = 0, as well as the cubic equation x? + x? — 2x 
—2=(x?-2)«%+1)=0, the quartic equation x*++4x?+x*-8x-6 
= (x* — 2)(x + 1)(x + 3) = 0, and so on. Our first stipulation, then, is that we 
use a polynomial of minimal degree. So, for the algebraic number V2, we 
would employ the quadratic above and not its higher degree cousins. 

Suppose that x, is an irrational algebraic number. Following Liouville’s 
notation, we denote its minimal-degree polynomial by 


Px) Sax? sb ne et ae eh, (1) 


where a, b,c,..., g, and h are integers and n 2 2 (as noted above, ifn = 1, 
the algebraic number is rational). Because P(xp) = 0, the factor theorem 
allows us to write 


P(x) = (x — Xp) Q@), (2) 


where Q is a polynomial of degree n—1. Liouville wished to estab- 
lish a bound upon the size of |Q@)I, at least for values of x in the vicinity 
of X9. We give his proof and then follow it with a simpler alternative. 
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Liouville’s Inequality: If x9 is an irrational algebraic number with 
minimum-degree polynomial P(x) =ax"+bx"1+ cx"? +---+9x+h 
having integer coefficients and degree n = 2, then there exists a positive 


real number A so that, if p/q is a rational number in [xp — 1, x) + 1], then 


1 
= ; 
Aq" 


—— Xq 


E 
q 


Proof: The argument has its share of fine points, but we begin with the 
real polynomial Q introduced in (2). This is continuous and thus 
bounded on any closed, finite interval, so there exists an A > 0 with 


lIQ@)ISA forall xin [xy-1, x9 + 1). (3) 


Now consider any rational number p/q within one unit of x, where 
we insist that the rational be in lowest terms and that its denominator be 
positive (i.e., that q 2 1). We see by (3) that |Q(p/q)| < A. We claim as well 


that P(p/q) # 0, for otherwise we could factor P(x) = [s - Eee, and 
it can be shown that R will be an (n-—1)st-degree polynomial 


having integer coefficients. Then 0 = P(x9) = [ = Pees) and yet 
q 


[x = | ¥ O (because the rational p/q differs from the irrational x), 


and we would conclude that R(x,) = 0. This, however, makes xg a root 
of R, a polynomial with integer coefficients having lower degree than 
P, in violation of the assumed minimality condition. It follows that p/q 
is not a root of P(x) = 0. 

Liouville returned to the minimal-degree polynomial in (1) and 
defined f (p,q) = q"P(p/q). Note that 


f(D = q'P(p/q) 
= q"[a(p/q)" + b(p/q)"* + c(plq)"? +++» + g(plq) + hl 


n-l n—2 2 


= ap" +bp"“q+cp" “q + +++ gpg") +hq". (4) 


From (4), he made a pair of simple but telling observations. 


|22 CHAPTER 8 


First, f(p, q) is an integer, for its components a, b, c,... , g, h, along 
with p and q, are all integers. Second, f(p, q) cannot be zero, for, if 0 = 
f(p, 9 = q"P(p/q), then either q = 0 or P(p/q) = 0. The former is impos- 
sible because q is a denominator, and the latter is impossible by our 
discussion above. Thus, Liouville knew that f(p, q) was a nonzero inte- 
ger, from which he deduced that 


Iq" Pp/gl = fy, M2 1. (5) 


The rest of the proof followed quickly. From (3) and (5) and the 
fact that P(x) = (x — X9) Q(x), he concluded that 


1 S$ |q"P(p/q)| = q"\p/q — XollQ(p/q)| S$ q"Ip/q — XolA. 
Hence |p/q — x,| 2 1/Aq”, and the demonstration was complete. Q.E.D. 


The role played by inequalities in Liouville’s proof is striking. Modern 
analysis is sometimes called the “science of inequalities,” a characteriza- 
tion that is appropriate here and would become ever more so as the cen- 
tury progressed. 

We promised an alternate proof of Liouville’s result. This time, our 
argument features Cauchy’s mean value theorem in a starring role [9]. 


Liouville’s Inequality Revisited: If x9 is an irrational algebraic number 
with minimum-degree polynomial P(x) = ax"+ bx"! + cx"-4 4-5-4 
gx +h having integer coefficients and degree n > 2, then there exists 
an A > 0 such that, if p/q is a rational number in [xg — 1, x) + 1], then, 


1 
> : 
Aq" 


eS 


F 
q 


Proof: Differentiating P, we find P’(x) = nax"—! + (n — 1)bx"-? + (n — 2) cx-3 
+---+. This (n— 1)st-degree polynomial is bounded on [x9 — 1,x9+ Ll, 
so there is an A > 0 for which |P’(x)| $A for all x € [xp — 1, Xp + 1]. Let- 
ting p/q be a rational number within one unit of x, and applying the 
mean value theorem to P, we know there exists a point c between x 
and p/q for which 


P(piq) — P(xg) _ 


P’(c). (6) 
pq - Xo 
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Given that P(xg) = 0 and c belongs to [xg — 1, x) + 1], we see from (6) 
that 


IP(p/q)| = |p/q — Xol - IPO! $ Alp/q — xl. 


Consequently, |q"P(p/q)| < Aq"|p/q — xo. But, as noted above, q"P(p/q) is 
a nonzero integer, and so 1 < Aq"|p/q — xl. The result follows. Q.E.D. 


At this point, an example might be of interest. We consider the algebraic 
irrational x) = V2. Here the minimal-degree polynomial is P(x) = x? — 2, 
the derivative of which is P’(x)=2x. It is clear that, on the interval 
[V2 -1,V2 +1], P’ is bounded by A =2¥V2 +2. Liowville’s inequality 


shows that, if p/q is any rational in this closed interval, then Esaals 
l q 
(2V2 +2)q° 
The numerically inclined may wish to verify this for, say, q = 5. In this 


1 
en 
(50/2 +50) 
check all the “fifths” within one unit of V2. Fortunately, there are only ten 
such fractions, and all abide by Liouville’s inequality: 


P 


case, the inequality becomes E - 72 = 0.00828. We then 


p/5 Ip/5 — V2| 
3/5 = 0.60 0.8142 
4/5 = 0.80 0.6142 
5/5 = 1.00 0.4142 
6/5 = 1.20 0.2142 
7/5 = 1.40 0.0142 
8/5 = 1.60 0.1858 
9/5 = 1.80 0.3858 
10/5 = 2.00 0.5858 
11/5 =2.20 0.7858 
12/5 = 2.40 0.9858 


The example suggests something more: we can in general remove the 
restriction that p/q lies close to Xo. That is, we specify A* to be the greater 
of 1 and A, where A is determined as above. If p/q is a rational within one 
unit of xo, then 


1 
> oe because A* => A. 


|24 CHAPTER 8 


On the other hand, if p/q is a rational more than one unit away from Xp, then 


P 


ome He 


q 
The upshot of this last observation is that there exists an A* > 0 for which 


1 
—_ Xo| 2 ~ regardless of the proximity of p/q to Xo. 
q A" q 
Informally, Liouville’s inequality shows that rational numbers are poor 
approximators of irrational algebraics, for there must be a gap of at least 


1 1 
212-2 because A* 2 1 and q 2 1 as well. 
A* A* q’ 


between X, and any rational p/q. It is not easy to imagine how Liouville 
A* gq” 

noticed this. That he did so, and offered a clever proof, is a tribute to 
his mathematical ability. Yet all may have been forgotten had he not taken 


the next step: he used his result to find the world’s first transcendental. 


LIOUVILLE’S TRANSCENDENTAL NUMBER 


We first offer a word about the logical strategy. Liouville sought an 
irrational number that was inconsistent with the conclusion of the inequal- 
ity above. This irrational would thus violate the inequality’ assumptions, 
which means it would not be algebraic. If Liouville could pull this off, he 
would have corralled a specific transcendental. Remarkably enough, he did 
just that [10]. 


1 
19120 


Proof: There are three issues to address, and we treat them one at a time. 
First, we claim that the series defining xg is convergent, and this 
follows easily from the comparison test. That is, k! =k guarantees that 
Ld — 1 — 1 10 _ 1 
—, <->, and so ), —— converges because = =: 
10" ~ 108 Ligromer dior 1-v0 9 
In short, x, is a real number. 

Second, we assert that xg is irrational. This is clear from its decimal 
expansion, 0.1100010000000 . . . , where nonzero entries occupy the 
first place, the second, the sixth, the twenty-fourth, the one-hundred 
twentieth, and so on, with ever-longer strings of Os separating the 


+-+-is transcendental. 
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increasingly lonely 1s. Obviously no finite block of this decimal expan- 
sion repeats, SO XQ is irrational. 

The final step is the hardest: to show that Liouvilles number is 
transcendental. To do this, we assume instead that x, is an algebraic 
irrational with minimal polynomial of degree n = 2. By Liouville’s ine- 
quality, there must exist an A* >0 such that, for any rational p/q, we 


1 
have a Xo| 2 and, as a consequence, 
A*® q’ 


(7) 


We now choose an arbitrary whole number m>n and look at the 
m 
; 1 1 1 1 
partial sum yy =o 
lo” LO OY ag 
these fractions, their common denominator would be 10, so we could 
m 
write the sum as 7 : {> Where p,, is a whole number. Thus, 
el 10 Rt LO™ 


is a rational. 


1 
+---+-——.. If we combine 
10” 


Pm 
m! 


of course, 


Comparing this to xo, we see that 


Pin 
10” 


Xo| = 


pa = : + : + : Hest 
k=m+1 1 igh 10%"*2! 10°"+3)! 


An induction argument establishes that (m+r)! = (m+1)! + (r—1) for any 


whole number r= 1, and so = < — = — = } 
As a consequence, 1G iii 8 alice 1G laa (BIG 


Pm = 1 4 1 ae 1 Heese 
10™ xo: = Loom! — yolm+2! gin)! 
2 1 1 1 
= 10o") + 100") x10 + 10¢"t»! x (107) 
1 


+ —_____ + rie 
jot! x (10°) 


2 Pe ee ee Pee 
10"*D!}" 10 100 1000 


1 10 2 
7 19om+D! 9 | < jot! ‘ (8) 
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A contradiction is now at hand because 


1 
0<—s co") aul ~ xX, by (7) 
m!iyn 2 
2 2 2 


= 19"tD aca) = Lome) < 10™ : 


where the last step follows because m >n implies thatm+1—n>1. 
This long string of inequalities shows that, for the value of A* 


introduced above, we have —~ < ——~ for allm > n, or simply that 2A* > 
10” 


10™ for all m > n. Such an inequality is absurd, for 2A* is a fixed num- 
ber, whereas 10 explodes to infinity as m gets large. Liouville had (at 
last) reached a contradiction. 

By this time, the reader may need a gentle reminder of what was 
contradicted. It was the assumption that the irrational xo is algebraic. 
There remains but one alternative: x) must be transcendental. And the 
existence of such a number is what Joseph Liouville had set out to 
prove. Q.E.D. 


In his 1851 paper, Liouville observed that, although many had specu- 
lated on the existence of transcendentals, “I do not believe a proof has ever 
been given” to this end [11]. Now, one had. 

Strangely enough, Liouville regarded this achievement as something 
less than a total success, for his original hope had been to show that the 
number e was transcendental [12]. It is one thing to create a number, as 
Liouville did, and then prove its transcendence. It is quite another to do 
this for a number like e that was “already there.” With his typical flair, Eric 
Temple Bell observed that it is 


a much more difficult problem to prove that a particular suspect, 
like e or Z, is or is not transcendental than it is to invent a whole 
infinite class of transcendentals:...the suspected number is 
entire master of the situation, and it is the mathematician in this 
case, not the suspect, who takes orders. [13] 


We might say that Liouville demonstrated the transcendence of a num- 
ber no one had previously cared about but was unable to do the same for 
the ubiquitous constant e, about which mathematicians cared passionately. 
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Still, it would be absurd to label him a failure when he found something his 
predecessors had been seeking in vain for a hundred years. 

That original objective would soon be realized by one of his followers. In 
1873, Charles Hermite (1822-1901) showed that e was indeed a transcen- 
dental number. Nine years later Ferdinand Lindemann (1852-1939) proved 
the same about 7. As is well known, the latter established the impossibility of 
squaring the circle with compass and straightedge, a problem with origins in 
classical Greece that had gone unresolved not just for decades or centuries 
but for millennia [14]. The results of Hermite and Lindemann were impres- 
sive pieces of reasoning that built upon Liouville’ pioneering research. 

To this day, determining whether a given number is transcendental 
ranks among the most difficult challenges in mathematics. Much work has 
been done on this front and many important theorems have been proved, 
but there remain vast holes in our understanding. Among the great achieve- 
ments, we should mention the 1934 proof of A. O. Gelfond (1906-1968), 
which demonstrated the transcendence of an entire family of numbers at 
once. He proved that if a is an algebraic number other than 0 or | and if b 
is an irrational algebraic, then a’ must be transcendental. This deep result 


guarantees, for instance, that 2° or (V2 +3/5)"7 are transcendental. 
Among other candidates now known to be transcendental are e”, In(2), 
and sin(1). 

However, as of this writing, the nature of such “simple” numbers as 
m, e*, and 2” is yet to be established. Worse, although mathematicians 
believe in their bones that both 7+ e and a X e¢ are transcendental, no one 
has actually proved this [15]. We repeat: demonstrating transcendence is 
very, very hard. 

Returning to the subject at hand, we see how far mathematicians had 
come by the mid-nineteenth century. Liouville’ technical abilities in 
manipulating inequalities as well as his broader vision of how to attack so 
difficult a problem are impressive indeed. Analysis was coming of age. 

Yet this proof will serve as a dramatic counterpoint to our main theo- 
rem from chapter 11. There, we shall see how Georg Cantor found a 
remarkable shortcut to reach Liouville’s conclusion with a fraction of the 
work. In doing so, he changed the direction of mathematical analysis. The 
Liouville—Cantor interplay will serve as a powerful reminder of the con- 
tinuing vitality of mathematics. 

For now, Cantor must wait a bit. Our next object is the ultimate in 
nineteenth century rigor: the mathematics of Karl Weierstrass and the 
greatest analytic counterexample of all. 


CHAPTER 9 


tr 


Weierstrass 


Karl Weierstrass 


As we have seen, mathematicians of the nineteenth century imparted 
to the calculus a new level of rigor. By our standards, however, these achieve- 
ments were not beyond criticism. Reading mathematics from that period is a 
bit like listening to Chopin performed on a piano with a few keys out of 
tune: one can readily appreciate the genius of the music, yet now and then 
something does not quite ring true. The modern era would not arrive until 
the last vestige of imprecision disappeared and analytic arguments became, 
for all practical purposes, incontrovertible. The mathematician most respon- 
sible for this final transformation is Karl Weierstrass (1815-1897). 

He followed a nontraditional route to prominence. His student years 
had been those of an underachiever, featuring more beer and swordplay 
than is normally recommended. At age 30 Weierstrass found himself on 
the faculty of a German gymnasium (i.e., high school) far removed from the 
intellectual centers of Europe. By day, he instructed his pupils on the arts 


128 
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of arithmetic and calligraphy, and only after classes were finished and the 
lessons corrected could young Weierstrass turn to his research [1]. 

In 1854 this unknown teacher from an unknown town published a 
memoir on Abelian integrals that astonished the mathematicians who read 
it. It was evident that the author, whoever he was, possessed an extraordi- 
nary talent. Within two years, Weierstrass had secured a position at the 
University of Berlin and found himself on one of the world’s great mathe- 
matics faculties. His was a true Cinderella story. 

Weierstrasss contributions to analysis were as profound as his peda- 
gogical skills were legendary. With a reputation that spread through Ger- 
many and beyond, he attracted young mathematicians who wished to learn 
from the master. A school of disciples formed at his feet. This was almost 
literally true, for severe vertigo required Weierstrass to lecture from an easy 
chair while a designated student wrote his words upon the board (an 
arrangement subsequent professors have envied but seldom replicated). 

If his teaching style was unusual, so was his attitude toward publica- 
tion. Although his classes were filled with new and important ideas, he 
often let others disseminate such information in their own writings. Thus 
one finds his results attributed somewhat loosely to the School of Weier- 
strass. Modern academics, operating in “publish or perish” mode, find it 
difficult to fathom such a nonpossessive view of scholarship. But Weier- 
strass acted as though creating significant mathematics was his job, and he 
would risk the perishing. 

Whether through his own publications or those of his lieutenants, the 
Weierstrassian school imparted to analysis an unparalleled logical preci- 
sion. He repaired subtle misconceptions, proved important theorems, and 
constructed a counterexample that left mathematicians shaking their 
heads. In this chapter, we shall see why Karl Weierstrass came to be 
known, in the parlance of the times, as the “father of modern analysis” [2]. 


BACK TO THE BAsiIcs 


We recall that Cauchy built his calculus upon limits, which he defined 
in these words: 


When the values successively attributed to a variable approach 
indefinitely to a fixed value, in a manner so as to end by differing 
from it by as little as one wishes, this last is called the limit of all 
the others. 
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To us, aspects of this statement, for instance, the motion implied in the 
term “approach,” seem less than satisfactory. Is something actually mov- 
ing? If so, must we consider concepts of time and space before talking of 
limits? And what does it mean for the process to “end”? The whole busi- 
ness needed one last revision. 

Contrast Cauchy’s words with the polished definition from the Weier- 
strassians: 


lim f(x)= L if and only if, for every ¢> 0, there exists a 6>0 
xd 
so that, if 0 <|x—al < 6, then |f(@) -Ll<e. (1) 


Here nothing is in motion, and time is irrelevant. This is a static rather 
than dynamic definition and an arithmetic rather than a geometric one. At 
its core, it is nothing but a statement about inequalities. And it can be 
used as the foundation for unambiguous proofs of limit theorems, for 
example, that the limit of a sum is the sum of the limits. Such theorems 
could now be demonstrated with all the rigor of a proposition from Euclid. 

Some may argue that precision comes at a cost, for Weierstrass austere 
definition lacks the charm of intuition and the immediacy of geometry. To 
be sure, a statement like (1) takes some getting used to. But geometrical 
intuition was becoming suspect, and this purely analytic definition was in 
no way entangled with space or time. 

Besides reformulating key concepts, Weierstrass grasped their mean- 
ings as his predecessors had not. An example is uniform continuity, a 
property that Cauchy missed entirely. We recall that Cauchy defined con- 
tinuity on a point-by-point basis, saying that f is continuous at a if 
in f(x) = f(@. In Weierstrassian language, this means that to every € > 0, 
there corresponds a 6>0 so that, if 0<|x-—al< 6, then |f(@x) —fl@l<e. 
Thus, for a fixed “target” e and a given a, we can find the necessary 6. But 
here 6 depends on both € and a. Were we to keep the same € but consider a 
different value of a, the choice of 6 would, in general, have to be adjusted. 

It was Eduard Heine (1821-1881) who first drew this distinction in 
print, although he suggested that “the general idea” was conveyed to him 
by his mentor, Weierstrass [3]. Heine defined a function f to be uniformly 
continuous on its domain if, for every € > 0, there exists a 6> 0 so that, if x 
and y are any two points in the domain within 6 units of one another, then 
|f(x) — f(y)| < e. This means, in essence, that “one 6 fits all,” so that points 
within this uniform distance will have functional values within e of one 
another. 
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It is clear that a uniformly continuous function will be continuous at 
each individual point. The converse, however, is false, and the standard 
counterexample is the function f(x) = 1/x defined on the open interval 
(0, 1), as shown in figure 9.1. This is certainly continuous at each point of 
(0, 1), but it fails Heine’s criterion for uniformity. To see why, we let ¢= 1 
and claim that there can be no 6> 0 with the property that, when x and y 


1 


1 
are chosen from (0, 1) with |x — yl < 6, then f(x) — f(wl= - 7 <1. 


For, given any proposed 6, we can choose an integer N > max{1/6, 1} and 
let x = 1/(N + 2) and y = I/N. In this case, both x and y belong to (0,1) and 


il al 2D: N+2 1 
Ix — yl= = < =— <6. 
N+2 N) N(N+2) N(N+4+2) N 
But Eee : : 2«1l=e.Th f if 
uu iy y V (N oi 2) UN € Trequiremen or unlorm 


continuity is not met. 


Figure 9.1 
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A look back to chapter 6 reminds us that Cauchy had talked about 
continuous functions but actually used uniform continuity in some of his 
proofs. Fortunately, a logical catastrophe was averted in 1872 when 
Heine proved that a function continuous on a closed, bounded interval 
[a,b] must in fact be uniformly continuous. That is, the distinction 
between continuity and uniform continuity disappears if we restrict our 
attention to [a, b]. (Note that the example above is defined on an open 
interval.) So, when Cauchy’s misconception occurred for functions on 
closed, bounded intervals, his proofs were “salvageable” thanks to Heine’s 
result. 

Weierstrass recognized an even more crucial dichotomy: that between 
pointwise and uniform convergence. These ideas warrant a brief digression. 

Suppose we have a sequence of functions, fy, fo, f3,---5 fy. --» all 
with the same domain. If we fix a point x in this domain and substitute it 
into each function, we generate a sequence of numbers: f,(x), (5), 
fa), ..., f,00,.... Assume that, for each individual x, this numerical 
sequence converges. We then create a new function f defined at each point 
x by f(x) = lim fy (0). We call f the “pointwise limit” of the f,. 

For instance, consider the following sequence of functions on [0, z]: 


jo =n &, jo = Gin e, fg S181. "na cg FO) = SUS 2 ay 
the first three of which are graphed in figure 9.2. 


k 
We see that, for all k = 1, ne — [sin 4 = l,and so lim n=) = 
2 2 k— 00 2 


1 
lim 1 = 1.On the other hand, ifx is in [0, z] but x # - then sin x = 1, where 


O<r<l,andso lim fp = lim (r") = 0. Hence the pointwise limit is 
0 00 


0 ifO<x<2z/2, 
fQ)= lm f,00 441 it x=27/2, 
ae 0 ifm/2<x<xa, 


whose graph is shown in figure 9.3. 

This example raises one of the great questions of analysis: if each of 
the f,, has a certain property and f is their pointwise limit, must f itself have 
this property? In mathematical parlance, we ask whether a characteristic is 
inherited by pointwise limits. If each f,, is continuous, must f be continu- 
ous? If each is integrable, must f be integrable? 
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y =sinx 


BS 
2 


Figure 9.2 


The intuitive answer might be, “Sure, why not?” Alas, the world is not so 
simple. For instance, continuity is not inherited by pointwise limits, a source 
of confusion for Cauchy and other mathematicians of the past [4]. We need 
only look at the example above to see that the functions f,(x) = (sin x)* are 


| 34 CHAPTER 9 


Figure 9.3 


continuous everywhere, but their pointwise limit fin figure 9.3 is not con- 
tinuous at x = 2/2. This same example shows that differentiability is not 
inherited either. 

What about integrals? Already in this book we have seen occasions 
where mathematicians assumed that 


: b b : 
tin J nbooae= f{[ im fo fa 


This asserts that we may safely interchange two important calculus opera- 
tions: integrate and then take the limit or take the limit and then integrate. 

To see that this too is in error, we define a sequence of functions f, on 
[0, 1] by 


0 if (ee. 
2k 
(16k*)x —8k if eae ae 
f,0O = 5 vi 
(-16k2)x+16k if —<xe<-H, 
k k 
ae dl 
0) if —<x<l. 
k 


Although this expression may look daunting, the graphs of f,, f,, and f, in 
figure 9.4 reveal that the functions are fairly tame. Each is continuous, with 
“spikes” of increasing height but decreasing width situated ever closer to 
the origin. 
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Be sas 1, 
4 2 6 3 
y = F(x) y = fox) y 


Figure 9.4 


Because the f,, are continuous, they can be integrated, and it is easy to 
evaluate their integrals as triangular areas (see figure 9.5): 


[fea = Area of triangle = >b xh= (3) x (4k) = 1. 


So, as the bases of these triangular regions get smaller, their heights grow 
in such a way that the triangular areas remain constant. Clearly, then, 


i J Jno) = fim = @ 
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Y =f) 


i 
2k 4k k 


Figure 9.5 


On the other hand, we assert that the pointwise limit of the f,, is zero 
everywhere on [0, 1]. Certainly f (0) = 0, because f,(0) = 0 for each k. And 


if 0<x<1, we choose a whole number N so that = <x and observe 


that for all subsequent functions, that is, for all f, with k 2 N, the “spike” 
has moved to the left of x, making f,(x) =0. Thus f(x) = lim fp) = 0 
as well. As a consequence, we see that aie 


J. jim (x) Jax = [fod =f'0-de=0 
OlLesae” 0 0 : 
Comparing this to (2) reveals the disheartening fact that the limit of 
the integrals need not be the integral of the limits. Symbolically, we have a 
case where lim I; fp dx # f.] jm fic] dx. Again, pointwise limits do 
k—oo k—eoo 


not behave “nicely”’—an analytic circumstance much to be regretted. 
By 1841 Weierstrass understood this state of affairs and proposed a 
way around it [5]. Characteristically, he did not publish his ideas until 
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1894—more than half a century later—but his students had spread the 
word long before. The idea was to introduce a stronger form of conver- 
gence, called uniform convergence, under which key properties transfer 
from individual functions to their limit. 

Following his lead, we define a sequence of functions f, to converge 
uniformly to a function f on a common domain if for every €> 0, there is 
a whole number N so that, if k 2 N and if x is any point in the domain, 
then |f;,(x) — f(x)| < €. Ina manner reminiscent of uniform continuity, this 
says that “one N fits all x” in the domain of the functions f,. 

This mode of convergence can be illustrated geometrically. Given 
€>0, we draw a band of width € surrounding the graph of y = f(x), as 
shown in figure 9.6. By uniform convergence, we must reach a subscript 
N so that fy and all subsequent functions in the sequence lie entirely with- 
in this band. As the name suggests, such functions approximate f uni- 
formly across the interval [a, b]. 

It is easy to see that if a sequence of functions converges uniformly to 
f, then it converges pointwise to f, but not conversely. For example, the 
“spike” functions described above converge pointwise but not uniformly 
to the zero function on [0, 1]. Uniform convergence is a stronger, more 
restrictive phenomenon than mere pointwise convergence. 

We have undertaken this digression for a few reasons. First, we shall 
need the notion of uniform convergence in the chapter's main result. Sec- 
ond, echoes of these ideas appear throughout the remainder of the book. 
Finally, such considerations illustrate why Weierstrass is so important in 
the history of calculus. In the words of Victor Katz, 


Not only did Weierstrass make absolutely clear how certain quan- 
tities in his definition(s) depended on other quantities, but he also 
completed the transformation away from the use of terms such as 
“infinitely small.” Henceforth, all definitions involving such ideas 
were given arithmetically [6]. 


FouR GREAT THEOREMS 


Besides revisiting definitions, Weierstrass was a master at employing 
them to prove theorems of importance. Here we shall mention (but not 
prove) four of his results involving uniform convergence. 

The first two address a topic mentioned above: under uniform con- 
vergence, important analytic properties transfer from the individual f, to 
the limit function f. 
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Figure 9.6 


Theorem 1: If {f,} isa sequence of continuous functions converging uni- 
formly to f on [a, b], then f itself is continuous. 


Theorem 2: If {f,} is a sequence of bounded, Riemann-integrable func- 
tions converging uniformly to f on [a, b], then f is Riemann-integrable 
on [a, b] and 


tn|f ncoae]= fm hco]ac= [2 fooa 


By theorem 2, the interchange of limits and integrals is permissible for 
uniformly converging sequences of functions. 

The third result is now called the Weierstrass approximation theorem. 
It provides a fortuitous connection between continuous functions and 
polynomials. 


Theorem 3 (Weierstrass approximation theorem): If f is a continuous 
function defined on a closed, bounded interval [a, b], then there exists 
a sequence of polynomials P,, converging uniformly to f on [a, b]. 


What is so fascinating about this theorem is that continuous functions 
can be quite ill behaved (this, in fact, is the point of Weierstrass’s coun- 
terexample, which we examine in a moment). Polynomials, by contrast, 
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are as tame as can be. That the latter uniformly approximate the former 
seems a wonderful piece of good fortune. 

These three theorems, then, make the case for uniform convergence. 
They allow for the transfer of continuity and integrability from individual 
functions to their limit and provide a vehicle for approximating continu- 
ous functions by polynomials. But is there an easy way to establish uni- 
form convergence in the first place? 

One route is to apply the so-called Weierstrass M-test, the last of our 
preliminary results. As before, we begin with a sequence of functions {f,} 
defined on a common domain, but the M-test introduces a new twist: we add 

n 


these to create partial sums § (x)= DH Ace) = fix) + f(x) +--+ f,(x). If 
k=l 
the sequence of partial sums {S,,} converges uniformly to a function f, we 


say the infinite series of functions YAO converges uniformly to 
k=l 
f. With this background, we now state the following result. 


Theorem 4 (Weierstrass M-test): Ifa sequence {f,} of functions defined on a 
common domain has the property that, for each k, there exists a positive 
number M,, so that |f,00)| < M, for all x in the domain and if the infinite 
series YM , converges, then the series of functions >, fy0X) converges 

k=l k=1 
uniformly. 


This amounts to a comparison test between functions and numbers, 
where convergence of the series of numbers implies uniform convergence 
of the series of functions. For example, consider the function defined on 
[0, 1] by 


2 4 
x x 
(C3 Y +—+—5+--- 
J “(k+ oF a 3 4 
x" 1 1 
Here we have oo = < < for all x in [0,1], and we 


(k+1| (k+1 Rk? 


2 
1 _ ® py Euler’ result from chapter 4. Uniform conver- 
k? 6 
k=1 
gence follows immediately from the M-test. Moreover, if we apply theorems 


know that 
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1 and 2 to the partial sums S,,, we know that f is itself continuous because 
each of the partial sums is and that 


[ fea = fin S,(xde| = tin] f}S,(2) de 


oo 


n k 
1 x ui 
= lim —— = hm 
neo of (k+)? neo 2s 1) 
25 — ye ae 
aitk+1)* lik? 90 


again with a little help from Euler. Here we have included all the interven- 
ing steps as a reminder of how complicated matters become when we 
interchange infinite processes. The Weierstrass M-test has allowed us to 
conclude that f is continuous and to evaluate its integral exactly—a pretty 
significant accomplishment. 

At last the preliminaries are behind us, and the stage is set for a math- 
ematical bombshell. 


WEIERSTRASS’S PATHOLOGICAL FUNCTION 


Mathematicians long knew that a differentiable (“smooth”) function 
must be continuous (“unbroken”), but not conversely. A V-shaped function 
like y = |x|, for instance, is everywhere continuous but is not differentiable at 
x =0, where its graph abruptly changes direction to produce a corner. 

It was believed, however, that continuous functions must be smooth 
“most of the time.” The renowned André-Marie Ampére (1775-1836) had 
presented a proof that continuous functions are differentiable in general, 
and calculus textbooks throughout the first half of the nineteenth century 
endorsed this position [7]. 

It certainly has appeal. Anyone can imagine a continuous “sawtooth” 
graph rising smoothly to a corner, then descending to the next corner, 
then rising to the next, and so on. As we compress the “teeth,” we get ever 
more points of nondifferentiability. Nonetheless, it seems that there must 
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remain intervals where the graph rises or falls smoothly to get from one 
corner to the next. In this way, the geometry suggests that any continuous 
function must have plenty of points of differentiability. 

It was thus a shock when Weierstrass constructed his function con- 
tinuous at every point but differentiable at none, a bizarre entity that 
seemed to be unbroken yet everywhere jagged. Regarded by most people 
as unimaginable, his function not only refuted Ampére’s “theorem” but 
drove the last nail into the coffin of geometric intuition as a trustworthy 
foundation for the calculus. 

By all accounts, Weierstrass concocted his example in the 1860s and 
presented it to the Berlin Academy on July 18, 1872. As was his custom, 
he did not rush the discovery into print; it was first published by Paul du 
Bois-Reymond (1831-1889) in 1875. 

Needless to say, so peculiar a function is far from elementary. In terms 
of technical complexity, it is probably the most demanding result in this 
book. But its counterintuitive nature, not to mention its historical signifi- 
cance, should make the effort worthwhile. Here we follow Weierstrasss 
argument but modify his notation and add a detail now and then for the 
sake of clarity. 

We start with a lemma that Weierstrass would need later. He proved it 
with a trigonometric identity, but we present an argument using calculus. 


cos(Az + Bz) — cos(Az) 
B 


Lemma: If B > 0, then <r 


Proof: Let h(x) =cos(zx) over the interval [A,A+B]. By the mean 
value theorem, there is a point c between A and A+B such that 
h(A + B)—h(A = 
Aaa eG) » (A) = h’(c). This amounts to Cosi - cos(Am) = 


cos(Az + Bz) — cos(Az) 
B 
|—asin(cz) |S m-l=a. Q.E.D. 


—zsin(cz), and it follows that 


We now introduce, in his own words, Weierstrass famous counter- 
example. 


Theorem: If a= 3 is an odd integer and if b is a constant strictly between 
O and 1 such that ab>1+32/2, then the function fx)= 


by b® cos(ma"x) is everywhere continuous and nowhere differentiable [8]. 
k=0 
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Dies kann z. B. folgendermassen geschehen. 
Es sei # eine reelle Veranderliche, a eine ungrade ganze Zahl, b eine 


positive Constante, kleiner als 1, und 


f(x) = Sb" cos (ae); 


so ist f(x) eine stetige Function, von der sich zeigen lasst, dass sie, sobald- 
der Werth des Products ab eine gewisse Grenze iibersteigt, an keiner Stelle 
einen bestimmten Differentialquotienten besitzt. 


Weierstrass’s pathological function (1872) 


Proof: Obviously, he had done plenty of legwork before placing these 
strange restrictions upon a and b. To simplify the discussion, we shall 
let a=21 and b= 1/3. These choices satisfy the stated conditions 
because a 2 3 is an odd integer, b lies in (0, 1), and ab=7>1+32/2. 
Consequently, our specific function will be 

foe y sale See cos(2 1m x) n cos(44 12x) are 
ko 3 2 : (3) 


To prove the continuity of f, we need only apply the M-test. Clearly 


‘: a 
cos(2 1" 7x) «and ! converges to 3/2. Therefore, the series 


cos(21" rx) ; 
——— is 


converges uniformly to f. Because each summand 3h 


continuous everywhere, so is f by theorem 1 above. 


We seem to be halfway to showing that f is everywhere continuous 
and nowhere differentiable. However, proving the “nowhere differen- 
tiable” part is much, much more difficult. To this end, we begin by fixing 
a real number r. Our goal is to show that f’(r) does not exist. Because r 
is arbitrary, this will establish that f is differentiable at no point whatever. 

In following Weierstrass’ logic, it will be helpful to assemble a 
number of observations about seemingly unrelated matters. Rest 
assured that each will play a role somewhere in his grand production. 

First, Weierstrass noted that foreach m= 1,2, 3,..., the real num- 
ber 21'™r (like any real number) falls within half a unit of its nearest inte- 
ger. Thus, for each whole number m, there exists an integer ,, such that 
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ny <21" 7 S ay, + - (see figure 9.7). Letting €,, = 21™"r-— a,, be 
the associated gap, we see that 

a m + Em = 21". (4) 


WD g Ley 302 


Because _i <e< as it follows that 0< < . 
2 m 2. 21" 21 21" 


l-e 
For notational ease, we introduce h,, = i” and observe that 
1 21" 
21h, =l-—-e and —>——. 
ri "i i 322 ©) 

1/2 3/2 
Now, 0< <h,, < guarantees that limh,,=0 by the 

or or moe 


squeezing theorem. The sequence of positive terms {h,,} will be deci- 
sive in establishing nondifferentiability. 

At this point, we (temporarily) fix the integer m. As did Weier- 
strass, we use (3) and consider the differential quotient: 


- cos(21"a[r + h,,]) ~ cos(21" zr) 
fP+T) = 10) 2 3" py 3° 
h h 


m m 


= cos(21" ar + 21*zh,,) —cos(21* zr) 
k=0 "hin 


% y cos(21" xr + 21*zh,,) - cos(21" ar) 


6 
k=m ah, \ ) 


Here, the infinite series has been broken into two parts. Weierstrass 
would consider the absolute value of each separately. 
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For the first series, we apply the lemma with A=21"r and 
B=21"h,, to bound each summand as follows: 


cos(21* xr +21"xh )- cos(21* zr) 


m 


3"h,, 


cos(21" ar + 21"zh,,) — cos(21* zr) 


=7" 
21"h,, 


<7"x. 


Thus, by the triangle inequality, we have an upper bound for the first 
sum: 


m-1 


cos(21" xr + 21*zth,,) - cos(21" zr) 
k=0 3"hn 


m-1 
<>) 
k=0 


cos(21" ar + 21*zh,,) — cos(21* zr) 
3"hn 


m-1 + 
Si 7'a =n +74+494---4+7™1)= 1 : | a. 
k=0 


(7) 


The second series in (6) presents a greater challenge. We 
approach the task by making four pertinent observations: 


(A) If k = m, we see by (4) and (5) that 


21ker + 21krh,, = 21/21" + 21h, ] 
=21!™A[(O, + En) + 1 - €,,)] 
=21z[o,, + 1. 


But 21*-™ is an odd integer and a,, is an integer as well. Thus 
21*-™z[ a, + 1] is an even or odd integer multiple of depending 
on whether a,,+1 is even or odd. It follows that 
cos(21" ar + 21"h,,) = cos(21" “ala, + 1]) = (—)%™*1. 


(B) Again we stipulate that k>m and apply (4) to get 21*ar= 
21%" 7(21"r) = 21"-"x(a,, + €,,). By a familiar trig identity we have 
cos(21 rr) = cos "rg, + 21" te.) 
= cos 1* rm.) + cos(2 1 "re, 


— sin(21*-"7a,,) - sinQ21*-"ze,,). 
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Here 21"-"za@,, is an integral multiple of z whose parity depends on 
0, and so 


cos(21" ar) = (-1I)% -cos(21"”""ze,,) — 0- sin(21""" ze,,) 
=(-p%m -cos(21"""ze,,). 


(C) (An easy one) By the nature of cosine, 1 + cos(21*-"ze,,) = 0. 


(D) Because i <e< ’ we know that —~ <7é,,< and 
m > > 


1 
so cos(7€,,) 2 0. 2 2 2° 

We now apply (A) and (B) to get a lower bound for the absolute 
value of the second series in (6): 


y cos(21" ar + 21"zh,,) - aaa 


k=m 3"hn 

_ y Te = C1) wees" "TE, ) 
k=m "hy, 

_ y (-)°"*"[1 + cos(21*"" ze, )] 
= 3*hn 

_\(-pem"? = 1+cos(21""ze,,) 

= 7 : 2 = 


= Dee ses £5) 
oF 


because each term of the series is nonnegative by (C). 
This sum of nonnegative terms is surely greater than its first term 
(where k = m), so by (D) and (5), we have 


y cos(21* ar + 21*h,,) —cos(21"zr) 
= 3"hn 


eo) 


1 fuses) 1 nu” 68. 
> 5 > = 
an ah, 2 Gy 32 
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All of this has been a vast overture before the main performance. 
Weierstrass now derived the critical inequality, one that began with 
the result just proved and ended with a telling bound on the differen- 
tial quotient: 


= TM") < y cos(21" ar + 21! hy) —cos(21" zr) 
3 k=m 3 "hn 
PPh = 7 x cos(21* ar + 21*zh,,) —cos(21" zr) 
i k=0 ah, 
by (©) 
J f@t+h,) -—f@ 7 = cos(21" ar + 21*h,,) —cos(21" zr) 
hn k=0 3"h,, 
+h,,)- 
<|Lo 2 LO}, : 7") by (7). 


From the first and last terms of this string of inequalities, we deduce that 


a _ £0)! 
3 


=") 2 (7") = 2 - he (8) 


Two features of expression (8) are critical. First, the quantity 
2 
a 570. 14307 is a positive constant. Second, the inequality in (8) 


holds for our fixed, but arbitrary, whole number m. With this in mind, 
we now “unfix” m and take a limit: 


Sin | Pile 
m—-eo 3 6 


But we noted above that h,, 20 as m—-e. Therefore, f’(r) = 


oe f@~ 


ue 


i Beets 


moo 


m 


cannot exist as a finite quantity. In short (short?), 
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fis not differentiable at x =r. And because r was an unspecified real 
number, we have confirmed that Weierstrasss function, although 
everywhere continuous, is nowhere differentiable. Q.E.D. 


Once the reader catches his or her breath, a number of reactions are 
likely. One is sheer amazement at Weierstrass’ abilities. The talent 
involved in putting this proof together is quite extraordinary. 

Another may be a sense of discomfort, for we have just verified that a 
continuous function may have no point of differentiability. Nowhere does 
its graph rise or fall smoothly. Nowhere does its graph have a tangent line. 
This is a function every point of which behaves like a sharp corner, yet 
which remains continuous throughout. 

Would a picture of y =f (x) be illuminating? Unfortunately, because f 
is an infinite series of functions, we must be content with graphing a par- 
tial sum. We do just that in figure 9.8 with a graph of the third partial sum 

3 k 
ee x = 1X) eee cos(2 12x) m cos(44 lx) 
ho 3 3 9 


This reveals a large number of direction changes and some very steep ris- 
ing and falling behavior, but no sharp angles. Indeed, any partial sum of 
Weierstrass’s function, comprising finitely many cosines, is differentiable 
everywhere. No matter which partial sum we graph, we find not a single 
corner. Yet, when we pass to the limit to generate f itself, corners must 
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appear everywhere. Weierstrass’s function lies somewhere beyond the intu- 
ition, far removed from geometrical diagrams that can be sketched on a 
blackboard. Yet its existence has been unquestionably established in the 
proof above. 

A final reaction to this argument should be applause for its high stan- 
dard of rigor. Like a maestro conducting a great orchestra, Weierstrass 
blended the fundamental definitions, the absolute values, and a host of 
inequalities into a coherent whole. Nothing was left to chance, nothing to 
intuition. For later generations of analysts, the ultimate compliment was 
to say that a proof exhibited “Weierstrassian rigor.” 

To be sure, not everyone was thrilled by a function so pathological. 
Some critics reacted against a mathematical world where inequalities 
trumped intuition. Charles Hermite, whom we met in the previous chap- 
ter, famously bemoaned the discovery in these words: “I turn away with 
fright and horror from this lamentable evil of functions that do not have 
derivatives” [9]. Henri Poincaré (1854-1912) called Weierstrass’s example 
“an outrage against common sense” [10]. And the exasperated Emile Picard 
(1856-1941) wrote: “If Newton and Leibniz had thought that continuous 
functions do not necessarily have a derivative . . . the differential calculus 
would never have been invented” [11]. As though cast out of Eden, these 
mathematicians believed that paradise—in the form of an intuitive, geo- 
metric foundation for calculus—had been lost forever. 

But Weierstrass’s logic was ironclad. Short of abandoning the defini- 
tions of limit, continuity, and differentiability, or of denying analysts the 
right to introduce infinite processes, the critics were doomed. If some- 
thing like a continuous, nowhere-differentiable function was intuitively 
troubling, then scholars needed to modify their intuitions rather than 
abandon their mathematics. Analytic rigor, advancing since Cauchy, 
reached a new pinnacle with Weierstrass. Like it or not, there was no turn- 
ing back. 

In a continuing ebb and flow, mathematicians develop grand theories 
and then find pertinent counterexamples to reveal the boundaries of their 
ideas. This juxtaposition of theory and counterexample is the logical 
engine by which mathematics progresses, for it is only by knowing how 
properties fail that we can understand how they work. And it is only by 
seeing how intuition misleads that we can truly appreciate the power of 
reason. 


CHAPTER 10 


wr 


Second Interlude 


oF story has reached the year 1873, nearly a century after the pass- 
ing of Euler and two after the creation of the calculus. By that date, the 
work of Cauchy, Riemann, and Weierstrass was sufficient to silence any 
latter-day Berkeley who might happen along. Was there anything left to do? 

The answer, of course, is . . . “Of course.” As mathematicians grappled 
with ideas like continuity and integrability, their very successes raised 
additional questions that were intriguing, troubling, or both. Weierstrass 
pathological function was the most famous of many peculiar examples 
that suggested avenues for future research. Here we shall consider a few 
others, each of which will figure in the book’s remaining chapters. 

Our first is the so-called “ruler function,” a simple but provocative 
example that appeared in a work of Johannes Karl Thomae (1840-1921) 
from 1875. He introduced it with this preamble: “Examples of integrable 
functions that are continuous or are discontinuous at individual points are 
plentiful, but it is important to identify integrable functions that are dis- 
continuous infinitely often” [1]. 

His function was defined on the open interval (0, 1) by 


Vq_ ifx = p/qin lowest terms, 
r(x) = ee ao 
O if xis irrational. 


Thus, r(1/5) = r(2/5) = r(4/10) = 1/5, whereas r(2/6) = r(1/V2) = 0. Figure 
10.1 displays the portion of its graph above y = 1/7; below this, the scat- 
tered points become impossibly dense. The graph suggests the vertical 
markings on a ruler—hence the name. 

With the e-6 definition from the previous chapter, it is easy to prove 
the following lemma. 


Lemma: If a is any point in (0, 1), then lim r(x) = 0. 
xa 


Proof: For ¢>0, we chose a whole number N with 1/N < e. The proof 
rests upon the observation that in (0, 1) there are only finitely many 
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y =r(Xx) 
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Figure 10.1 


rationals in lowest terms whose denominators are N or smaller. For 
example, the only such fractions with denominators 5 or smaller are 
1/2, 1/3, 2/3, 1/4, 3/4, 1/5, 2/5, 3/5, and 4/5. Because this collection is 
finite, we can find a positive number 6 small enough that the interval 
(a— 6, a+ 6) lies within (0, 1) and contains none of these fractions, 
except possibly a itself. We now choose any x with 0<|x—al<6 
and consider two cases. If x = p/q is a rational in lowest terms, then 
Ir(x) — 0] = Ir(p/q)| = 1/q < I/N < € because q must be greater than N if 
p/q#aisin(a— 6,a+ 6). Alternately, if x is irrational, then |r(x) — 0| 
= 0 < eas well. In either case, for € > 0, we have found a 6> 0 so that, 
if 0 < |x —al< 6, then |r(x) — 0| < e. By definition, lim r(x) =0. 
xa 


QED. 


With the lemma behind us, we can demonstrate the ruler function’s 
most astonishing property: it is continuous at each irrational in (0, 1) yet 
discontinuous at each rational in (0, 1). This follows immediately because, 
if a is irrational, then r(a) = 0 = lim r(x) by the lemma—precisely Cauchy's 

xa 
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definition of continuity at a. On the other hand, if a = p/q is a rational in 
lowest terms, then 
r(a) = r(p/q) = Vq # 0 = limr(x), 
xa 
and so the ruler function is discontinuous at a. 

This presents us with a bizarre situation: the function is continuous 
(which our increasingly unreliable intuition regards as “unbroken”) at 
irrational points but discontinuous (“broken”) at rational ones. Most of 
us find it impossible to envision how the continuity/discontinuity 
points can be so intertwined. But the mathematics above is unambigu- 
ous. 

It will be helpful to extend the domain of the ruler function from 
(0, 1) to all real numbers. This is done by letting our new function take 
the value 1 at each integer and putting copies of r on each subinterval 
(1,2), (2,3), and so on. More precisely, we define the extended ruler 
function R by 


1 if xis an integer, 
R(x) =4r(x—n) ifn <x <n+lfor some integer n = 0, 
r(x+n+1) if -(n+1) <x <-—nfor some integern 2 0. 


As above, we have lim R(x)=0 for any real number a, and so R is 


continuous at each irrational and discontinuous at each rational. 

The ruler function raises a natural question: “How can we flip-flop 
roles and create a function that is continuous at each rational and discon- 
tinuous at each irrational?” Although simple to state, this has a profound, 
and profoundly intriguing, answer. It will be the main topic in our 
upcoming chapter on Vito Volterra. 

The ruler function R is also remarkable because, its infinitude of dis- 
continuities notwithstanding, it is integrable over [0, 1]. That, of course, is 
the essence of Thomae’s preamble above. To prove it, we use Riemann'’s 
integrability condition from chapter 7. 

Begin with a value of d>0 and a fixed oscillation o > 0. We then 
choose a whole number N such that 1/N < o. As in the argument above, 
we know that [0,1] contains only finitely many rationals in lowest 
terms for which R(p/q)21/N, namely those with denominators no 
greater than N. We let M be the number of such rationals and parti- 
tion [0,1] so that each of these lies within a subinterval of width d/2M. 
These will be what we called the Type A subintervals, that is, those 
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where the function oscillates more than o. Using Riemann’s terminol- 
ogy, we have 


d d\ d 
a a Eee 


Type A TypeA 


so that s(o) 0 as d > 0. This is exactly what Riemann needed to establish 
1 
integrability. In other words, ib R(x)dx exists. Further, knowing that the 


1 
integral exists, we can easily show that [ Rods = 0. 


It should be plain that the ruler function plays the same role as Rie- 
mann’s pathological function from chapter 7. Both are discontinuous infi- 
nitely often, yet both are integrable. The major difference between them is 
the ruler function’s relative simplicity, and, under the circumstances, a lit- 
tle simplicity is nothing to be sneered at. 

There is an intriguing question raised by these examples. We recall 
that Dirichlet’s function was everywhere discontinuous and not Riemann 
integrable. By contrast, the ruler function is discontinuous only on the 
rationals. This, to be sure, is awfully discontinuous, but the function still 
possesses enough continuity to allow it to be integrated. With such evi- 
dence, mathematicians conjectured that a Riemann-integrable function 
could be discontinuous, but not too discontinuous. Coming to grips with 
the continuity/integrability issue would occupy analysts for the remain- 
der of the nineteenth century. As we shall see in the book’s final chapter, 
this matter was addressed, and ultimately resolved, by Henri Lebesgue 
in 1904. 

Our next three examples are interrelated and so can be treated to- 
gether. Like the ruler function, these are fixtures in most analysis text- 
books because of their surprising properties. 


cos(1/x) ifx# 
0 ifx = 


As x approaches zero, its reciprocal 1/x grows aoa bound, causing 
cos(1/x) to gyrate from —1 to 1 and back again infinitely often in any 
neighborhood of the origin. To say that S oscillates wildly is an under- 
statement. 

We show that hn S(x) does not exist by introducing the sequence 


First, we define S(x) = and graph it in figure 10.2. 


{I/ka} for k=1, >. 3. ..and looking at the corresponding points on 
the graph. As indicated in figure 10.2, we are alternately selecting the 


crests and valleys of our function. That is, lima Vee) =0, but 
k—yoo 
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Figure 10.2 


lim S(/ka) = lim[cos(kz)] = lim(-D*. Because this last limit does not 
k-ye0 k-y00 k— 00 


exist, neither does lim S(x), which in turn means that S is discontinuous 
x0 
atx=0. 


xsin(/x) ifx #0 
0 ifx =0 
figure 10.3. Because of the multiplier x, the infinitely many oscillations of 
T damp out as we approach the origin. 
At any nonzero point, T is the product of the continuous functions y =x 
and y = sin(1/x) and so is itself continuous. Because —|x| < x sin (1/x) $ |x| 


and lim (— |xl)=0= lim|xl, the squeezing theorem guarantees that 
x0 x0 


lim T(x) = 0 = T(O), so T is continuous at x = 0 as well. In short, T is an 
x0 


A related function is T(x) = ” which is graphed in 


everywhere-continuous function. It is often cited as an example to show 
that “continuous” is not the same as “able to be drawn without lifting the 
pencil.” The latter may be a useful characterization in the first calculus 
course, but graphing y = T(x) in a neighborhood of the origin is impossi- 
ble with all those ups and downs. 

Finally, we consider the most provocative member of our trio: 


vGS- x? sin(1/x) if x #0, 
) ifx =0. 


154 CHAPTER 10 


Figure 10.3 


Here the quadratic coefficient accelerates the damping of the curve near 
the origin. Because U(x) = x T(x) and both factors are everywhere contin- 
uous, so is U. 

This time the troubling issue involves differentiability. At any x #0, 
the function is certainly differentiable, and the rules of calculus show that 
U’(x) = 2x sin(1/x) — cos(1/x). At x =0 the function is differentiable as 
well because 


De atk 
ys a ees 
x0 x-O x0 x x0 
where the final limit employs the same “squeeze” we just saw. So, in spite 
of its being infinitely wobbly near the origin, the function U has a hori- 
zontal tangent there. 
We have proved that U is everywhere differentiable with 


U'(x) = 2x sin(1/x) — cos(/x) ifx #0, 
= 0 ifx =0. 


Alas, this derivative is not a continuous function, for we again consider 
the sequence {1/kz} and note that 


lim U’ (+) = lim [2 sin(kz) — cost) = lim[0-(-1)*], 
hoo kn ko] ko hoo 
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which does not exist. Thus, lim U’(x) cannot exist and so U’ is discontin- 
x0 


uous at x= 0. In short, U is a differentiable function with a discontinuous 
derivative. 

This brings to mind the famous theorem that a differentiable function 
is continuous. It would be natural to propose the following modification: 
“The derivative of a differentiable function must be continuous.” The 
example of U, however, shows that such a modification is wrong. 

These examples also muddy the relationship between continuity and 
the intermediate value theorem. As we saw, Cauchy proved that a continu- 
ous function must take all values between any two that it assumes. This 
geometrically self-evident fact might appear to be the very essence of conti- 
nuity, and one could surmise that a function is continuous if and only if it 
possesses the intermediate value property over every interval of its domain. 

Again, this assumption turns out to be erroneous. As a counterexample, 
consider S from above. We have seen that S is discontinuous at the origin, 
but we claim that it has the intermediate value property over every interval. 

To prove this, suppose S(a) <r < S(b) for a<b. By the nature of the 
cosine, we know that —1 <r< 1. We now consider cases: 

If0<a<b or ifa<b<0, then S is continuous throughout [a, b] and 
so, for some c in (a, b), we have S(c) = r by the intermediate value theorem. 

On the other hand, if a<0<b, we can fix a whole number N with 


1 1 1 
N >—— Then a<O0< < <b, and as x runs between the 
2nb QN+D)x 2Na 


1 
ON + Da and Nn’ the value of 1/x runs between 2Nz 


and (2N+1)z. In the process, S(x) = cos(1/x) goes continuously from 
cos(2Nz) = 1 to cos[2N + 1)z] =—1. By the intermediate value theorem, 


positive numbers 


there must be ac between (and consequently between 


1 
——— and 
(2QN+)]a 2Na 
a and b) for which S(c) = r. The claim is thus proved. 

In summary, our examples have shown that the derivative of a differ- 
entiable function need not be continuous and that a function possessing 
the intermediate value property need not be continuous either. These may 
seem odd, but there is one last surprise in store. 

It was discovered by Gaston Darboux (1842-1917), a French mathe- 
matician who is known for a pair of contributions to analysis. First, he 
simplified the development of the Riemann integral so as to achieve the 
same end in a much less cumbersome fashion. Today’s textbooks, when 
they introduce the integral, tend to use Darboux’s elegant treatment 
instead of Riemann’s original. 
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But it is the other contribution we address here. In what is now called 
“Darboux’s theorem,” he proved that derivatives, although not necessarily 
continuous, must possess the intermediate value property. The argument 
rests upon two results that appear in any introductory analysis text: one is 
that a continuous function takes a minimum value on a closed, bounded 
interval [a, b], and the other is that ¢’(c) = 0 if g is a differentiable function 
with a minimum at x =c in (a, b). 


Darboux’s Theorem: If fis differentiable on [a, b] and if ris any number for 
which f’(a) < r<f’(b), then there exists a c in (a, b) such that f’(Q) = r. 


Proof: To begin, we introduce a new function g(x) = f(x) — rx. Because f 
is differentiable, it is continuous, and rx is continuous as well, so 
g is continuous on [a,b]. Further, g is differentiable, with g’(x) = 
foe=r. 

There is a point c in [a, b] where g takes a minimum value. Because 
g(a=f(a—-—r<0 and g’(b)=f’(b)—r>0, we see that a minimum 
cannot occur at a or b, and so ¢ lies in (a, b). Then by the second result 
cited above, 


O= go) =f'O =i, Or simply f’(O) — 
Thus f’ assumes the intermediate value r, as required. Q.E.D. 


The reader will recall that in Cauchy’s proof of the mean value theo- 
rem, he assumed his derivative was continuous in order to conclude that 
it took intermediate values. We now see that Cauchy could have discarded 
his assumption without discarding his conclusion. It also follows that a 
function lacking the intermediate value property, for instance, Dirichlet’s 
function, cannot be the derivative of anything. 

Darboux showed that derivatives share with continuous functions the 
property of taking intermediate values. And this suggests another ques- 
tion: “How discontinuous can a derivative really be?” As we see in the 
book’ next-to-last chapter, René Baire provided an answer in 1899. 

If derivatives were troubling, integrals were more so. We noted previ- 
ously that, even when the sequence {f,} converges pointwise, we cannot 
generally conclude that 


tim le (xa = f°] im f00| dx. () 
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Weierstrass showed that uniform convergence is sufficient to guarantee the 
interchange of limits and integrals, but it turns out not to be necessary. That 
is, examples {f;,} were found that converged pointwise but not uniformly 
and yet for which (1) holds. Perhaps mathematicians had overlooked some 
intermediate condition, not so restrictive as uniform convergence, that 
would allow the much-desired interchange. 

Or—and this at first seemed a very unlikely “or’—perhaps Riemann’s 
definition of the integral was at fault. In treating integration as he did, Rie- 
mann may have taken the wrong path, one that required special conditions 
in order for (1) to hold. If so, his integral could be regarded as defective. 

On the face of it, this sounded like heresy, for Riemann’s integral had 
become a pillar of mathematical analysis. Darboux described it as a cre- 
ation “of which only the greatest minds are capable” [2]. And Paul du 
Bois-Reymond stated his belief that Riemann’s definition could not be 
improved upon, for it extended the concept of integrability to its outer- 
most limits [3]. Yet, as we shall see, this and other shortcomings motivated 
research aimed at defining the integral more broadly. The result would be 
Lebesgue’s theory of integration from the turn of the twentieth century. 

To summarize, the functions above raised such questions as: 


¢ Can we construct a function continuous at each rational and 
discontinuous at each irrational? 

* How discontinuous can a Riemann integrable function be? 

¢ How discontinuous can a derivative be? 

¢ How, if at all, can we correct the deficiencies in the Riemann 
integral? 


Although not an exhaustive list, these were critical issues confronting 
mathematical analysis as the nineteenth century entered its final quarter. 
By their very nature, such questions could hardly have been asked, let 
alone answered, before the contributions of Cauchy, Riemann, and Weier- 
strass. As the challenges grew ever more sophisticated, their resolutions 
would require increasingly careful reasoning. In the remainder of the 
book, we shall indicate how each of these four questions was answered. 

Our first stop, however, will be an 1874 paper by Georg Cantor, the 
genius who gave birth to set theory and applied his ideas to re-prove the 
existence of transcendentals. His achievement illustrates as well as any- 
thing the benefits of thinking anew about matters long regarded as settled. 


CHAPTER I! 


ti 


Cantor 


Georg Cantor 


The essence of mathematics lies in its freedom” [1]. So wrote Georg 
Cantor (1845-1918) in 1883. Few mathematicians so thoroughly embraced 
this principle and few so radically changed the nature of the subject. Joseph 
Dauben, in his study of Cantor's works, described him as “one of the most 
imaginative and controversial figures in the history of mathematics” [2]. The 
present chapter should demonstrate why this assessment is valid. 

Cantor came from a line of musicians, and it is possible to see in him 
tendencies more often associated with the romantic artist than with the 
pragmatic technician. His research eventually carried him beyond mathe- 
matics to the borders of metaphysics and theology. He raised many an 
eyebrow with claims that Francis Bacon had written the Shakespearean 
canon and that his own theory of the infinite proved the existence of God. 
As an uncompromising advocate of such beliefs, Cantor had a way of alien- 
ating friend and foe alike. 
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Meanwhile, his life was troubled. He suffered bouts of severe depres- 
sion, almost certainly a bipolar disorder whose recurrences robbed him of 
the “mental freshness” he so coveted [3]. Time and again Cantor was sent to 
what were called neuropathic hospitals to endure whatever treatment they 
could offer. In 1918 he died in a psychiatric institution after a life with 
more than its share of unhappiness. 

None of this detracts from Cantor's mathematical triumph. For all of 
his misfortune, Georg Cantor revolutionized the subject whose freedom 
he so loved. 


THE COMPLETENESS PROPERTY 


As a young man, Cantor had studied with Weierstrass at the University 
of Berlin. There he wrote an 1867 dissertation on number theory, a field 
very different from that for which he would become known. His research 
led him to Fourier series and eventually to the foundations of analysis. 

As we have seen, developments in the nineteenth century placed cal- 
culus squarely upon the foundation of limits. It had become clear that lim- 
its, in turn, rested upon properties of the real number system, foremost 
among which is what we now call completeness. Todays students may 
encounter completeness in different but logically equivalent forms, such as: 


Cl. Any nondecreasing sequence that is bounded above con- 
verges to some real number. 

C2. Any Cauchy sequence has a limit. 

C3. Any nonempty set of real numbers with an upper bound has 
a least upper bound. 


Readers in need of a quick refresher are reminded that {x;,} is a Cauchy 
sequence if, for every € > 0, there exists a whole number N such that, if m 
and n are whole numbers greater than or equal to N, then |x,, — x,| < €. In 
words, a Cauchy sequence is one whose terms get and stay close to one 
another. This idea put in a brief appearance in chapter 6. 

Likewise, M is said to be an upper bound of a nonempty set A ifa <M 
for all elements a in A, and / is a least upper bound, or supremum, of A if 
(1) A is an upper bound of A and (2) if M is any upper bound of A, then 
A <M. These concepts appear in any modern analysis text. 

There is one other version of completeness, cast in terms of nested 
intervals, that will play an important role in the next few chapters. Again, 
we need a few definitions to clarify what is going on. 
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A closed interval [a, b] is nested within [A, B] if the former is a subset 
of the latter. This amounts to nothing more than the condition that A < 
a <b<B. Suppose further that we have a sequence of closed, bounded 
intervals, each nested within its predecessor, as in [a,, b,] D [a,, b5] D 
la3, bs] >---D [a,, b,] D ---. Such a sequence is said to be descending. 
With this we can introduce another version of completeness: 


C4. Any descending sequence of closed, bounded intervals has a 
point that belongs to each of the intervals. 


It is worth recalling why the intervals in question must be both closed 
and bounded. The descending sequence of closed (but not bounded) 
intervals 


[1, 9) D [2, ©) 5 [3, 0) D--- D[k,-) D--- 


has no point common to all of them, and the descending sequence of 
bounded (but not closed) intervals 


(0, 1) > 0, 1/2)3 (0, 1/3)3---D(0, Ik) D--- 


likewise has an empty intersection (to use set-theoretic terminology). 
Although our nineteenth century predecessors often neglected such dis- 
tinctions, we shall arrange for our intervals to be both closed and bounded 
before applying C4. 

Each of these four incarnations of completeness guarantees that some 
real number exists, be it the limit to which a sequence converges, or the 
least upper bound that a set possesses, or a point common to each of a col- 
lection of nested intervals. As mathematicians probed the logical founda- 
tions of calculus, they realized that such existence was often sufficient for 
their theoretical purposes. Rather than identify a real number explicitly, it 
may be enough to know that a number is out there somewhere. Com- 
pleteness provides that assurance. 

One might ask: if the completeness property is so important, how do 
we prove it? The answer required mathematicians to understand the real 
number system itself. From the whole numbers, it is a straightforward task 
to define the integers (positive, negative, and zero) and from there to define 
the rationals. But can we create the real numbers from more elementary 
systems, just as the rationals were defined in terms of the integers? 

Affirmative answers to this question came from Cantor and, inde- 
pendently, from his friend Richard Dedekind (1831-1916). Cantor's 
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construction of the reals was based on equivalence classes of Cauchy 
sequences of rational numbers. Dedekind’s approach employed partitions 
of the rationals into disjoint classes, the so-called “Dedekind cuts.” A thor- 
ough discussion of these matters would carry us far afield, for construct- 
ing the real numbers from the rationals is a bit esoteric for this book and, 
truth be told, a bit esoteric for most analysis courses. Nonetheless, Cantor 
and Dedekind did it successfully and then used their ideas to prove the 
completeness property as a theorem in their newly created realm. 

This achievement can be seen as the final step in the separation of 
calculus from geometry. Dedekind and Cantor had gone back to the arith- 
metic basics—the whole numbers—from which the reals, then the com- 
pleteness property, and eventually all of analysis could be developed. 
Their achievement received the apt but nearly unpronounceable moniker: 
“the arithmetization of analysis.” 


THE NONDENUMERABILITY OF INTERVALS 


It is not for defining the real numbers that Cantor has been chosen to 
headline this chapter. Rather it is for his 1874 paper, “Uber eine Eigenschaft 
des Inbegriffes aller reellen algebraischen Zahlen” (On a Property of the Total- 
ity of All Real Algebraic Numbers) [4]. This was a landmark in the history 
of mathematics, one that demonstrated, in Dauben’s words, “[Cantor’s] gift 
for posing incisive questions and for sometimes finding unexpected, even 
unorthodox answers” [5]. 

Oddly, the significance of the paper was obscured by its title, for the 
result about algebraic numbers was but a corollary, albeit a most interest- 
ing one, to the paper's truly revolutionary idea. That idea, simply stated, is 
that a sequence cannot exhaust an open interval of real numbers. As we 
shall see, Cantor's argument involved the completeness property, thus 
placing it properly in the domain of real analysis. 


Theorem: If {x;,} is a sequence of distinct real numbers, then any open, 
bounded interval (@, 8) of real numbers contains a point not included 
among the {x;}. 


Proof: Cantor began with an interval (a, 8) and considered the sequence 
in consecutive order: x1, X>, X3,X4,.... [fmone or just one of these terms 
lies among the infinitude of real numbers in (a, 8), then the proposition 
is trivially true. 
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Suppose, instead, that the interval contains at least two sequence 
points. We then identify the first two terms, by which we mean 
those with the two smallest subscripts, that fall within (a, 8). We 
denote the smaller of these by A, and the larger by B,. This step is illus- 
trated in figure 11.1. Note that the initial few terms of the sequence 
fall outside of (a, 8) but that x, and x, fall within it. By our definition, 
A, =z (the smaller) and B, = x, (the greater). 

We make two simple but important observations: 


l, @< A, <By< 6, and 
2. ifa sequence term x, falls within the open interval (A,, B,), 
then k 2 3. 


The second of these recognizes that at least two sequence terms are 
used up in identifying A, and B,, so any term lying strictly between A, 
and B, must have subscript k = 3 or greater. In figure 11.1, the next 
such candidate would be xz. 

Cantor then examined (Aj, B,) and considered the same pair of 
cases: either this open interval contains none or just one of the terms 
of {x;,} or it contains at least two of them. In the first case the theorem 
is true, for there are infinitely many other points in (A,, B,), and thus 
in (a, B), that do not belong to the sequence {x;}. In the second case, 
Cantor repeated the earlier process by choosing the next two terms of 
the sequence, that is, those with the smallest subscripts, that fall with- 
in (A,, B,). He labeled the smaller of these A,, and the larger B,. If we 
look at figure 11.2 (which includes more terms of the sequence than 
did figure 11.1), we see that A, = x19 and that B, = xj}. 

Here again it is clear that 


1. a<A,<A,<B,<B,< 8B, and 
2. if x, falls within the open interval (A, B,), then k 25. 


As before, the latter observation follows because at least four terms of the 
sequence {x;} must have been consumed in finding A,, By, Aj, and B,. 


a A By B 
Xe x4 x5 X7 X4 Xo X38 


Figure 11.1 


CANTOR 163 


XE X4 X5 Xg X7 x10 x14 X4 Xg Xp X3 


Figure 11.2 


Cantor continued in this manner. If at any step there were one or 
fewer sequence terms remaining within the open subinterval, he 
could immediately find a point—indeed infinitely many of them— 
belonging to (@, B) but not to the sequence {x;}. The only potential 
difficulty arose if the process never terminated, thereby generating a 
pair of infinite sequences {A,} and {B,} such that 


l. @<A, <A, <A <+ ++ <A, < +++ < BL < +++ <BR <B,< 
B, < B, and 
2. if x; falls within the open interval (A,, B,), then k 2 2r+ 1. 


We then have a descending sequence of closed and bounded 
intervals [A,, B,] D [A>, B,] D [A3, B3] D---, each nested within its 
predecessor. By the completeness property (C4), there is at least one 
point common to all of the [A,, B,]. That is, there exists a point c 
belonging to [A,, B,] for all r= 1. To finish the proof, we need only 
establish that c lies in (@, B) but is not a term of the sequence {x;,}. 

The first observation is immediate, for c is in [A,, B,] © (a, B) 
and so c indeed falls within the original open interval (@, ). 

Could c appear as a term of the sequence {x;}? If so, then c =x, 
for some subscript N. Because ¢ lies in all of the closed intervals, it lies 
in [Ay 3; Baya], and thus 


AeA KSB Be, 


It follows that c = xy lies in the open interval (Ay, By), and so, accord- 
ing to (2) above, N22N +1. This, of course, is absurd. We conclude 
that c can be none of the terms in the sequence {x;,,}. 

To summarize, Cantor had demonstrated that in (a, f) there is a 
point not appearing in the original sequence {x,}. The existence of 
such a point was the object of the proof. Q.E.D, 


Today, this theorem is usually preceded by a bit of terminology. 
We define a set to be denumerable if it can be put into a one-to-one 
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correspondence with the set of whole numbers. Sequences are trivially 
denumerable, with the required correspondence appearing as the sub- 
scripts. An infinite set that cannot be put into a one-to-one correspon- 
dence with the whole numbers is said to be nondenumerable. We then 
characterize the result above as proving that any open interval of real 
numbers is nondenumerable. 

The evolution of Cantors thinking on this matter is interesting. 
Through the early 1870s, he had pondered the fundamental properties of 
the real numbers, trying to isolate exactly what set them apart from the 
rationals. Obviously, completeness was a key distinction that somehow 
embodied what was meant by “the continuum” of the reals. 

But Cantor began to suspect there was a difference in the abundance of 
numbers in these two sets—what we now call their “cardinality’—and in 
November of 1873 shared with Dedekind his doubts that the whole num- 
bers could be matched in a one-to-one fashion with the real numbers. 
Implicitly this meant that, although both collections were infinite, the 
reals were more so. 

Try as he might, Cantor could not prove his hunch. He wrote 
Dedekind, in some frustration, “as much as I am inclined to the opinion 
that [the whole numbers] and [the real numbers] permit no such unique 
correspondence, I cannot find the reason” [6]. A month later, Cantor had 
a breakthrough. As a Christmas gift to Dedekind, he sent a draft of his 
proof and, after receiving suggestions from the latter, cleaned it up and 
published what we saw above. Persistence had paid off. 

Readers who know Cantor's “diagonalization” proof of nondenumer- 
ability may be surprised to see that his 1874 reasoning was wholly differ- 
ent. The diagonal argument, which Cantor described as a “much simpler 
demonstration,” appeared in an 1891 paper [7]. In contrast to the 1874 
proof, which, as we have seen, invoked the completeness property, diago- 
nalization was applicable to situations where completeness was irrelevant, 
far from the constraints of analysis proper. 

Although the later argument is more familiar, the earlier one repre- 
sents the historic beginning and so has been included here. We stress 
again that Cantor's original proof did not use terms like denumerability 
nor raise specific questions about infinite cardinalities. All this would 
come later. In 1874, he simply showed that a sequence cannot exhaust an 
open interval. 

But why should anyone care? It was a good question, and Cantor had 
a spectacular answer. 
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THE EXISTENCE OF TRANSCENDENTALS, REVISITED 


We recall that Cantor’s paper was titled, “On a Property of the Totality 
of All Real Algebraic Numbers.” To this point, algebraic numbers have yet 
to be mentioned, nor have we said anything about the “property” of these 
numbers to which the title refers. The time has come to address those 
omissions. 

As we saw, a real number is algebraic if it is the solution to a polyno- 
mial equation with integer coefficients. There are infinitely many of these 
(for instance, any rational number), and it was no easy matter for Liouville 
to find a number that lay outside the algebraic realm. 

Cantor, upon considering the matter, claimed that it was possible to 
list the algebraic numbers in a sequence. At first glance, this may seem 
preposterous. It would require him to generate a sequence with the twin 
properties that (1) every term was an algebraic number and (2) every 
algebraic number was somewhere in the sequence. A clever eye would be 
necessary to do this in an orderly and exhaustive fashion, but Cantor was 
nothing if not clever. He began by introducing a new idea. 


Definition: If P(x) =ax"+bx™!+cx™’+---+gx+h is an nth-degree 
polynomial with integer coefficients, we define its height by (n— 1) + 
lal + |b] + |cl +---++ hl. 


For instance, the height of P(x) = 2x? — 4x7 + 5is(3—-1)+2+44+5=13 
and that of Q(x) = x® — 6x* — 10x3 + 12x* - 60x +17 is (6-1)+1+6+ 
10+ 12+604+17=111. 

Clearly the height of a polynomial with integer coefficients will itself 
be a whole number. Further, any algebraic number has a minimal-degree 
polynomial whose coefficients we can assume to have no common divisor 
other than 1. These conventions simplify the task at hand. 

Cantor in turn collected all algebraic numbers that arise from polyno- 
mials of height 1, then those that arise from polynomials of height 2, then 
of height 3, and so on. This was the key to arranging algebraic numbers 
into an infinite sequence, here denoted by {a,}. 

To see the process in action, we observe that the only polynomial with 
integer coefficients of height 1 is P(x)=1-x!=x. The solution to the 
associated equation P(x) = 0 is the first algebraic number, namely a, = 0. 
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There are four polynomials with height 2: 
PP O)=s",. Po) =2%, Pye er 1, Fo =e 1. 


Setting the first and second equal to zero yields the solution x = 0, which 
we do not count again. Setting P(x) = 0 gives a, =—1 and P, (x) = 0 gives 
d3=1. 

We continue. There are eleven polynomials of height 3: 


POS 2), POH 2k), PO Ha el, Pala) Sor 1, 
Ps (x) =x? +x, Pg (x) =x* —x, P, (x) = 3x, Pg (x) = 2x +1, 
Pg (x) = 2x-1, Pip) =x+2, Py, QQ) =x-2. 


Upon setting these equal to zero, we get four new algebraic numbers: 


a,= i 2, and a, =2. 


As his title indicated, Cantor was restricting his attention to real algebraic 
numbers, so 0 = P(x) = x? + 1 added nothing to the collection. 

And on we go. There are twenty-eight polynomials of height 4, and 
from these we harvest a dozen additional algebraic numbers, some of 
which are irrational. For instance, the polynomial P(x) =x*+x-— 1 is of 


-1+ 5 -1- 5 
—_ and nc. 


height 4 and contributes 


As the heights increase, more and more algebraic numbers appear. 
Conversely, any specific algebraic number must arise from some polyno- 
mial with integer coefficients, and this polynomial, in turn, has a height. 


For instance, the algebraic number 2 + 2/5, which we encountered in 


chapter 8, is a solution to the polynomial equation x° — 6x*— 10x? + 12x? — 
60x + 17 =0 with height 111. 
A few simple observations allowed Cantor to wrap up his argument: 


* For a given height, there are only finitely many polynomials 
with integer coefficients. 

¢ Each such polynomial can generate only finitely many new 
algebraic numbers (because an nth-degree polynomial equation 
can have no more than n solutions). 

¢ Hence, for each height there can be only finitely many new 
algebraic numbers. 
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This means that, upon “entering” a given height in our quest for algebraic 
numbers, we must emerge from that height after finitely many steps. We 
cannot get “stuck” in a height trying to list an infinitude of new algebraic 
numbers. 

Consequently, the number V2 + 4/5 with its polynomial of height 
111 has to show up somewhere in our sequence {a,,}. It will take a while, 
but the process must, after finitely many steps, bring us to height 111, and 
then, as we run through the polynomials of this height, we reach x° — 6x* — 
10x? + 12x* — 60x + 17 after finitely many more. This will determine the 
position of /2 + 2/5 in the sequence {a,}. The same can be said of any 
real algebraic number. So, the “property” of the algebraic numbers men- 
tioned in Cantor’s title is, in modern parlance, its denumerability. 

Now he combined his two results: first, that a sequence cannot exhaust 
an interval and, second, that the algebraic numbers form a sequence. Indi- 
vidually, these are interesting. Together, they allowed him to conclude that 
the algebraic numbers cannot account for all points on an open interval. 
Consequently, within any (a, 8), there must lie a transcendental. 

Or, to put it directly, transcendental numbers exist. 

Of course, this was what Liouville had demonstrated a few decades 

- 1 = 1,1,1 1 


earlier when he showed that = + + + + 
ar 10 10° 10° 10% 


1 


10!20 
numbers, he went out and found one. 

Cantor reached the same end by very different means. Early in his 
1874 paper, he had promised “a new proof of the theorem first demon- 
strated by Liouville,” and he certainly delivered [8]. But his argument, as 
we have seen, contained no example of a specific transcendental. It was 
strikingly nonexplicit. 

To contrast the two approaches, we offer the analogy of finding a nee- 
dle in a haystack. We envision Liouville, industrious to a fault, putting on 
his old clothes, hiking out to the field, and rooting around in the hay under 
a broiling sun. Hours later, drenched with perspiration, he pricks his finger 
on the elusive quarry; a needle! Cantor, by contrast, stays indoors using 
pure reason to show that the mass of the haystack exceeds the mass of the 
hay in it. He deduces that there must be something else, that is, a needle, to 
account for the excess. Unlike Liouville, he remains cool and spotless. 

Some mathematicians were troubled by a nonconstructive proof that 
relied upon the properties of infinite sets. Compared to Liouville’s lengthy 


+--+ was transcendental. To prove the existence of transcendental 


168 CHAPTER 11 


argument, Cantor's seemed too easy, almost like sleight-of-hand. The 
young Bertrand Russell (1872-1970) may not have been alone in his ini- 
tial reaction to Cantor’s ideas: 


I spent the time reading Georg Cantor, and copying out the gist of 
him into a notebook. At that time I falsely supposed all his argu- 
ments to be fallacious, but I nevertheless went through them all in 
the minutest detail. This stood me in good stead when later on I 
discovered that all the fallacies were mine [9]. 


Like Russell, mathematicians came to appreciate Cantor for the inno- 
vator he was. His 1874 paper ushered in a new era for analysis, where the 
ideas of set theory would be employed alongside the ¢— 6 arguments of 
the Weierstrassians. 

Cantor's work had consequences, many of which were truly astonish- 
ing. For instance, it is easy to show that if the algebraic numbers and the 
transcendental numbers are each denumerable, then so is their union, the 
set of all real numbers. Because this is not so, Cantor knew that the tran- 
scendentals form a nondenumerable set and thus far outnumber their 
algebraic cousins. Eric Temple Bell put it this way: “The algebraic numbers 
are spotted over the plane like stars against a black sky; the dense black- 
ness is the firmament of the transcendentals” [10]. This is a delightfully 
unexpected realization, for the plentiful numbers seem scarce, and the 
scarce ones seem plentiful. In a sense, Cantor showed that the transcen- 
dentals are the hay and not the needles. 

A related but more far-reaching consequence was the distinction 
between “small” and “large” infinite sets. Cantor proved that a denumer- 
able set, although infinite, was insignificantly infinite when compared to a 
nondenumerable counterpart. As his ideas took hold, mathematicians 
came to regard denumerable sets as so much jetsam, easily expendable 
when addressing questions of importance. 

As we shall see, dichotomies between large and small sets would arise 
in other analytic settings. At the turn of the nineteenth century, René Baire 
found a “large/small” contrast in what he called a set's “category,” and 
Henri Lebesgue found another in what he called its “measure.” Although 
cardinality, category, and measure are distinct concepts, each provided a 
means of comparing sets that would prove valuable in mathematical 
analysis. 

Cantor addressed other questions about infinite sets. One was, “Are 
there nondenumerable sets having greater cardinality than intervals?” This 
he answered in the affirmative. Another was, “Are there infinite sets of an 
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intermediate cardinality between a denumerable sequence and a nonde- 
numerable interval?” This he never succeeded in resolving. With Cantor’ 
founding vision and continuing research, set theory took on a life of its 
own, quite apart from the concerns of analysis proper. But it all grew out 
of his 1874 paper. 

Unlike many revolutionaries down through history, Georg Cantor 
lived to see his ideas embraced by the wider community. An early enthusi- 
ast was Russell, who described Cantor as “one of the greatest intellects of 
the nineteenth century” [11]. This is no small praise from a mathemati- 
cian, philosopher, and eventual Nobel laureate. 

Another of Cantor’s admirers was the Italian prodigy Vito Volterra. His 
work, which beautifully combined Weierstrassian analysis and Cantorian 
set theory, is the subject of our next chapter. 


CHAPTER 12 


ti 


Volterra 


Vito Volterra 


a Volterra (1860-1940) flourished alongside a number of Italian 
mathematicians in the second half of the nineteenth century. Like his coun- 
trymen Giuseppe Peano (1858-1932), Eugenio Beltrami (1835-1900), 
and Ulisse Dini (1845-1918), he left his mark, contributing to applied 
areas like electrostatics and fluid dynamics, as well as to theoretical ones 
like mathematical analysis. It is of course the last of these that we consider 
here. 

Although born on the Adriatic coast, Volterra was raised in Florence, 
the epicenter of the Italian Renaissance. He walked the same streets as had 
Michelangelo and attended schools named after Dante and Galileo. The 
fifteenth and sixteenth century Florentine atmosphere seems to have 
seeped into his bones, for Volterra loved art, literature, and music even 
as he loved science. He was a Renaissance Man, albeit three centuries 
removed. 
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Besides these pursuits, his political courage deserves to be celebrated. 
Witnessing the rise of Mussolini in the 1920s, Volterra took a public 
stand in opposition and signed a declaration against the regime. This act 
ultimately cost him his job but made him a hero for Italian intellectuals of 
the time. Upon his death in 1940, Italy had not yet shed its fascist 
scourge, but Volterra had fought the good fight in anticipation of a better 
future. 

If he showed great courage late in life, he had shown great precocity 
early on. Young Volterra read college-level mathematics texts at age 11, 
impressed his teachers during adolescence, and somehow secured a posi- 
tion as a physics laboratory assistant at the University of Florence while 
still in high school. His academic career was spectacularly rapid, culmi- 
nating with a doctorate in physics at the age of 22 [1]. 

In this chapter we discuss a pair of Volterra’s early discoveries, both 
published in 1881, three years after his high school graduation. The first 
was another in the growing list of pathological counterexamples, one that 
turned up a previously unnoticed flaw in the Riemann integral. The sec- 
ond, almost paradoxically, was a theorem showing that pathology has its 
limits, for Volterra proved that no function can be continuous at each 
rational point and discontinuous at each irrational one. Such a function 
would simply be too pathological to exist. We shall examine the theorem 
in full, but we begin with a few words about the counterexample. 


VOLTERRA’S PATHOLOGICAL FUNCTION 


The second version of the fundamental theorem of calculus, which we 
saw in chapter 6, was stated by Cauchy as follows: “If F is differentiable 


b 
and if its derivative F’ is continuous, then ) F’(x)dx = F(b) — F(a).” 
a 


Informally, this says that under the right conditions the integral of the 
derivative restores the original function. In the proof, Cauchy used the 
hypotheses that (a) F has a derivative and (b) this derivative is itself contin- 
uous. But were both necessary? 

Statement (a) seems indispensable, for we could not hope to integrate 
a derivative if the derivative fails to exist. But the status of (b) is more sus- 
pect. Must we assume something as restrictive as the continuity of F’ in 
order for the result to hold? 

This is not a trivial issue. On the one hand, we saw in chapter 10 that 
the continuity of a derivative cannot be taken for granted, for the function 
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x*sin(I/x) ifx #0, 
0 ifx = 0, 
other hand, we do not need continuity to guarantee the existence of an 
integral, for it is easy to find discontinuous but integrable functions. 

The question, then, was what condition, if any, we should impose 
upon F’ to guarantee the truth of the fundamental theorem. Discoveries of 
the previous years gave mathematicians a perspective on the matter that 
Cauchy did not have, so it seemed worthwhile to revisit this important 
theorem. 

In 1875, Gaston Darboux succeeded in weakening hypothesis (b). He 


b 
proved that J F’(x)dx = F(b)- F(a) provided that (a) F is differen- 


U(x) -| has a discontinuous derivative. On the 


tiable and (b’) its derivative F’ is Riemann integrable. Thus, we need not 


b 

assume the continuity of F’; the mere existence of i F’(x)dx is sufficient for 
da 

the fundamental theorem to hold. 


This was progress of a sort, but there remained the issue of whether 
we need to assume anything about F’ other than its existence. Perhaps 
derivatives are integrable by their very nature. If so, we could jettison both 
hypotheses (b) and (b’) and build the fundamental theorem of calculus 
upon the assumption of (a) alone. That would be a less restrictive, and 
much more elegant, state of affairs. 

It came down to this: How ill behaved can a derivative be? In an earlier 
chapter, we proved Darboux’s theorem that a derivative, even if not con- 
tinuous, must possess the intermediate value property. In that regard, 
derivatives seemed fairly “tame,” and mathematicians might guess that 
such tameness would include integrability. 

It was this misconception that the young Volterra refuted in his 1881 
paper “Sui principii del calcolo integrale” [2]. There he provided an example 
of a function F that had a bounded derivative at all points but whose deriv- 
ative was so discontinuous as to be nonintegrable. In other words, even 
though F was everywhere differentiable and its derivative F’ was bounded, 


b 
the integral ih F’(x)dx did not exist. And, because the integral failed to 


b 
exist, the equation i) F’(x)dx = F(b) — F(a) could not be true. Volterra’s 
a 


example was striking not because the left-hand side of this equation was 
different from the right-hand side, but because the left-hand side was mean- 
ingless! 

We shall not consider his function in detail, in part because it is com- 
plicated and in part because one chapter devoted to a pathological function 
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(Weierstrass’s) may be enough. The interested reader will find a discussion 
of Volterra’s work in [3]. 

One thing was clear: another unfortunate feature of the Riemann inte- 
gral had been unearthed. Mathematicians would have loved nothing more 
than an uncluttered theorem to the effect that if F is differentiable with a 


b 
bounded derivative F’, then i F’(x)dx = F(b)— F(a). Volterra showed 


that, so far as Riemann’s integral was concerned, this was not to be. 

How could mathematicians respond to Volterras strange example? 
One option was to accept the outcome and move on. When applying the 
fundamental theorem, we would simply impose an extra assumption 
about the derivative F’. This was the path of least resistance. 

There was, however, an alternative. As we saw earlier, Riemann’s 


b b 
integral provided no guarantee that lim | fp dx =} E fax) 


Now Volterra had destroyed any hope for a simple fundamental theorem 
of calculus. As the nineteenth century neared its end, there was more rea- 
son than ever to suspect that the trouble lay in Riemann’s definition and 
not in the intrinsic nature of analysis. A few daring souls, motivated in 
part by Volterra’s pathological function, were about to forsake the Rie- 
mann integral in order to salvage the theorems above. Stay tuned. 


HANKEL’S TAXONOMY 


By the 1880s, mathematical analysis was awash in pathological coun- 
terexamples, each seemingly stranger than the last. Among those we have 
seen are: 

(a) Dirichlet’s function (x) = i . a ralionals which is every- 

d_ if x is irrational, 
where discontinuous and not Riemann integrable. 
(b) The extended ruler function R, which is continuous at each irra- 


tional and discontinuous at each rational but also is Riemann 


1 
integrable with i} R(x)dx = 0. 
(c) Weierstrass’s pathological function f(x) = Y) b* cos(aa"x), 
k=0 
which is everywhere continuous and nowhere differentiable. 


The situation suggested analytic chaos and cried out for order to be 
imposed upon so disorderly a mathematical scene. 
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One who tried to do just that was Hermann Hankel (1839-1873). He 
was an admirer of Riemann who believed that functions should be classified 
in a manner familiar to biologists or geologists. He proposed such a classifi- 
cation in 1870, a few years before his untimely death. With this taxonomy, 
he hoped to clarify the nature and limitations of mathematical analysis. 

Hankel considered the family of all bounded functions defined on an 
interval [a, b] and distinguished them by means of their continuity/dis- 
continuity properties. To see how he proceeded, we recall a familiar defi- 
nition of Georg Cantor. 


Definition: A set A of real numbers is dense if any open interval contains 
at least one member of A. 


Elementary examples of dense sets are the rationals and the irrationals 
because any open interval holds infinitely many of both. The name is sug- 
gestive, for members of a dense set are so tightly packed that they are 
always nearby. 

With this in mind, we are ready for Hankel’s classification. In class 1 
he placed those functions continuous at all points of [a, b]. These were 
well behaved in that they assumed maximum and minimum values, pos- 
sessed the intermediate value property, and could be integrated. In Han- 
kel’s taxonomy, class 1 represented the top of the food chain. 

His second class included functions continuous except at finitely 
many points of [a, b]. These were more problematic, but their irregulari- 
ties, being finite in number, remained largely under control. One example 


is Sx) = a a ie. defined on [-1, 1] because, as we saw in 

0 ifx = 0, 
chapter 10, it has a single discontinuity at x = 0. Alternately, one could 
take a continuous function on an interval [a, b] and redefine it at, say, fifty 
points in order to introduce fifty discontinuities. Such a function would 
fall into Hankel’s class 2. 

Logically, there was but one class left: those functions possessing infi- 
nitely many points of discontinuity in [a, b]. These, of course, were the 
worst, but Hankel believed that they could be subdivided into the bad and 
the very bad: 


Class 3A: Functions discontinuous at infinitely many points of 
[a, b] but still continuous on a dense set. These he called 
“pointwise discontinuous.” 

Class 3B: Everything else. These Hankel called “totally discontin- 
uous. 
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We see that a pointwise discontinuous function in class 3A, in spite 
of its infinitude of discontinuities, must be continuous somewhere in any 
open interval. On the other hand, for a function in class 3B there 
must exist some open subinterval (c, d) within (a, b) where the function 
has no point of continuity at all. A totally discontinuous function 
thus features a solid subinterval featuring nothing but points of discon- 
tinuity. 

Where do the three pathological functions cited above fit into Han- 
kel’s scheme? Dirichlet’s function, being discontinuous everywhere, falls 
into class 3B as totally discontinuous. The ruler function is discontinuous 
at infinitely many points (the rationals) yet continuous on a dense set (the 
irrationals) and consequently belongs to class 3A as pointwise discontinu- 
ous. And Weierstrasss function, perhaps the weirdest of all, is paradoxi- 
cally in class 1, for it is continuous everywhere. 

Hankel found his classification important in the following sense: he 
knew that functions in class 1 and in class 2 are Riemann integrable, and 
the examples at his fingertips of pointwise discontinuous functions were 
integrable as well. By contrast, Dirichlet’s totally discontinuous function 
was not. To him, the gap between classes 3A and 3B seemed to be the 
unbridgeable chasm. As Thomas Hawkins put it, “By making the distinc- 
tion between pointwise and totally discontinuous functions, Hankel 
believed he had separated the functions amenable to mathematical analy- 
sis from those beyond its reaches” [4]. 

To demonstrate the value of all this, Hankel proved a spectacular the- 
orem: a bounded function on [a, b] was Riemann integrable if and only if 
it was no worse than pointwise discontinuous. That is, provided it fell into 
class 1, class 2, or class 3A, a bounded function could be integrated; those 
that occupied class 3B were not integrable and, by extension, analytically 
hopeless. 

Hankel’s theorem appeared to answer the major question we intro- 
duced earlier: “How discontinuous can an integrable function be?” 
The answer, according to him, was, “at worst pointwise discontinuous.” 
His proof showed that, so long as a function was continuous on a 
dense set, all those discontinuities would not matter in terms of inte- 
grability. This was exactly the kind of simple result mathematicians had 
longed for. 

Unfortunately, it was also incorrect. 

With ideas this complicated, even great scholars can make mistakes, 
and Hankel made a doozy. To be fair, half of his theorem was true: if a 
function is Riemann integrable, it must indeed be continuous on a dense 
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set. A totally discontinuous function, having a solid subinterval of points 
of discontinuity, cannot possess a Riemann integral. Again, one thinks of 
Dirichlet’s function in this regard. 

But Hankel’s proof of the converse was flawed. In 1875, the British 
mathematician H. J. S. Smith (1826-1883) published an example of a 
pointwise discontinuous but non-integrable function which, he said, 
“deserves attention because it is opposed to a theory of discontinuous 
functions which has received the sanction of an eminent geometer, 
Dr. Hermann Hankel, whose recent death at an early age is such a great 
loss to mathematical science” [5]. Smith’s example was nontrivial, requir- 
ing the construction of what we now call a nowhere dense set of positive 
measure. We refer those seeking details to Hawkins [6]. For now, we 
merely observe that the link between continuity and Riemann integrability 
remained unclear, and the question of how discontinuous an integrable 
function could be was still open. Pointwise discontinuity, whatever its 
value, did not provide the long-sought connection. 

Nonetheless there had been progress of a sort. Riemann had extended 
the notion of integrability to include some highly discontinuous func- 
tions, and the true half of Hankel’s theorem, along with Smith’s counterex- 
ample, showed that the Riemann-integrable functions were properly 
embedded within the larger collection of functions that were continuous 
on a dense set. 

We note in passing that the term “pointwise discontinuous” has some- 
times been carelessly taken to mean “at worst pointwise discontinuous.” 
That is, all functions in Hankel’ classes 1, 2, or 3A were lumped under the 
single rubric of pointwise discontinuity, which led to the bizarre situation 
of placing the continuous functions (class 1) among the “pointwise dis- 
continuous” ones. Because the common property of functions in these 
first three classes is that each is continuous on a dense set, we might sug- 
gest densely continuous as an umbrella term to include all functions in 
classes 1, 2, and 3A. 

In any case, Hankel’s taxonomy initially seemed to be a promising 
vehicle for carving apart the analytically accessible functions from the ana- 
lytically intractable ones. As it turned out, however, many of those 
intractable functions could be handled quite nicely within the context of 
set theory and the Lebesgue integral. Nowadays, Hankel’s distinctions 
have largely fallen by the wayside. 

But in the late nineteenth century, pointwise discontinuity remained a 
topic of research capable of engaging the most talented mathematicians. 
One of these was the 21-year-old Vito Volterra. 
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THE LIMITS OF PATHOLOGY 


The epidemic of pathological functions suggested that any behavior, 
no matter how bizarre, could be realized by an ingeniously constructed 
example from a suitably inventive mathematician. 

Who, for instance, could envision the ruler function, continuous at 
each irrational point and discontinuous at each rational one? And why not 
suppose that somewhere, waiting to be discovered, lay an equally peculiar 
function continuous at each rational point and discontinuous at each irra- 
tional? One seemed no more outlandish than the other. 

That continuity and discontinuity points can sometimes be inter- 
changed is evident in the following examples. First define 


if 
H(x) = fi ha This is obviously continuous at all points but the ori- 


daa 
gin, where it has its lone point of discontinuity. 


x 4 : ; 
x° ifx is rational, 


eas . It is not 
O if x is irrational. 


As its counterpart, we introduce K(x) = 


difficult to see that K is discontinuous at any a#0. For, if we let {x,} 
be a sequence of rationals converging to a and {y,} be a sequence of 


irrationals converging to a, then lim K(x) = lim (x;) =a’, whereas 
— oo — 00 
lim K(y,) = lim 0 =0#a’. Because these sequential limits differ, we 
—>oo p00 
know that lim K(x) cannot exist and so K is discontinuous at x = da. 
xa 


However, for any x, be it rational or irrational, we have 0 < K(x) $ x’, 
and so a simple squeezing argument shows that lim K(x) = 0 = K(0). It 
x7 


follows that K is a function with a lone point of continuity: the origin. So, 
for H and K as defined here, the points of continuity and of discontinuity 
have been swapped. 

In this regard, it will be useful to introduce the following. 


Definition: For a function f, we let C= {x|f is continuous at x} and 
D= {x|f is discontinuous at x}. 


Our previous discussion can be neatly summarized by: C), = txlx # 0} = 
D, and Cy = {0} = Dy. 

The issue of interchanging continuity and discontinuity points is an 
intriguing one. For any function f, is there a “complementary” function g 
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with C,= D, and C, = Dy? If so, how would one find it? If not, what would 
prevent it? 

In his 1881 paper, “Alcune osservasioni sulle funzioni punteggiate discon- 
tinue,” Volterra addressed this matter. The result was a powerful theorem 
with a pair of first-rate corollaries [7]. 


Theorem: There cannot exist two pointwise discontinuous functions on 
the interval (a, b) for which the continuity points of one are the dis- 
continuity points of the other, and vice versa. 


Proof: He proceeded by contradiction, assuming at the outset that f and @ 
are pointwise discontinuous on (a, b) such that Cr= Dy and D> Cag 
In other words, Cy and Cy, partition (a,b) into nonempty, disjoint, 
dense subsets. 

His proof rested upon a nested sequence of subintervals. Because 
f is pointwise discontinuous, it must have a point of continuity x 
somewhere in (a,b). For €=1/2, continuity guarantees that there 
exists a 6>0 so that (%) — 6, X) + 6) is a subset of (a, b) and, if 0< 
Ix — Xgl < 6, then f(x) — fo) < 1/2. We now choose a, <b, so that 
[a,,b,] is a closed subinterval of the open set (x) — 6, X9 + 6), as 
depicted in figure 12.1. 

For any two points x and y in [a,, b,], we apply the triangle ine- 
quality to see that 


If) —fOI S FO) - fol +f) —fl<124+12=1. @) 


This means that f does not oscillate more than 1 unit on the closed 
interval [a,, bj]. 

But (a, b,) is an open subinterval of (a, b) and @ is pointwise dis- 
continuous as well. Thus there is a point of continuity of @, say x,, 
within (a,, b,). Repeating the previous argument for @, we find points 
ay < by such that the closed interval [ay, bj] is a subset of (a,, b,) and 
|o(x) — @(y)| < 1 for any x and y in [ay, bj]. See figure 12.2. 


X96 XQ Xo+6 


Figure 12.1 
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a ay ay bby b 
Figure 12.2 


Combining this conclusion with that of (1) above, we have found 
a closed subinterval [aj, bj] so that, for all x and y within it, 


Ifo) -— f(y) < 1 and |@(x) - o(y) < 1. 


Volterra then exploited pointwise discontinuity to repeat the argu- 
ment with ¢= 1/4. Considering first f and then @, he found a closed 
interval [a5, b5] lying within the open interval (aj, bj) —and thus 
inside [aj, bj]}—such that |fOd - fly) < 1/2 and |@(x) - ¢(y)| < 1/2 
for any points x and y in [ a4, b5]. 


He continued with ¢=1/8, 1/16, and generally 1/2", thereby 
generating closed intervals [a], by] > [a3, b$] > [a3, b§] > --- such that 


IfC — FO) < 1/2"! and |¢(x) — o(y)| < 1/21 
for any x and y in [a;, bj]. (2) 


A contradiction was at hand. By the completeness property, 
there must be a point c common to all of the nested intervals [a;,, by]. 
Because ¢ lies in [a{, by], it is indeed in our original interval (a, b). 

We next claim that f is continuous at c. This follows easily, for 
Volterra had controlled the oscillation of f as he constructed his 
descending intervals. To be thoroughly Weierstrassian about it, 
we could take any ¢>0 and choose a whole number k so that 
1/2'-1 < €. We know that c is a point of [aj4;, bj4,], which in turn 
lies within the open interval (aj, b;) so we can find a 6>0 with 
(c— 6,c+6) Cc (ai, bi.) C [a},, b’,].. Consequently, for any x with 
O<\|x—cl<6, then by (2), we have |f(x) —f(O| < 1/2! <e. This 
proves that in f(x) = f(©), and so f is continuous at c as claimed. 


Because the same argument, word for word, can be applied to 4, it 
too is continuous at c. In this way, we have reached our contradiction, 
for c belongs to both C; and C,, violating the hypotheses that the con- 
tinuity points of one are the discontinuity points of the other. There is 
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no alternative but to conclude that two such pointwise discontinuous 
functions cannot exist. Q.E.D. 


Before proceeding, we make a pair of observations. The first is that 
Volterra was vague about insisting that the intervals [a;,,b;,] be closed. This 
is an omission easily repaired, as we have done. Second, in the example 
above where the continuity points of H are the discontinuity points of K 
and vice versa, we note that K is totally discontinuous (Hankel’ class 3B) 
rather than pointwise discontinuous (Hankel’s class 3A). Consequently— 
lest anyone lose sleep on this account—that example in no way contra- 
dicts Volterra’s result. 

He followed his theorem with two important corollaries. The first, 
which settled a major question of analysis, was stated as follows: 


Because we have a function continuous at each irrational point 
and discontinuous at each rational, it will be impossible to find a 
function that is discontinuous at each irrational point and contin- 
uous at each rational. [8] 


To flesh out his argument, we imagine a function G for which Cg is 
the (dense) set of rationals. Then G is pointwise discontinuous. But we 
have previously encountered the extended ruler function R which is 
pointwise discontinuous as well, with Cp being the set of irrationals. The 
continuity points of G would then be the discontinuity points of R, in con- 
tradiction to Volterra’s theorem. Consequently, it is impossible for both 
functions to exist. Because the ruler function most certainly does exist, we 
are forced to conclude that the function G does not. Volterra’s theorem 
demonstrated, in the parlance of a Western movie, that “this town is not 
big enough for both of them.” A function continuous only on the rationals 
is a logical impossibility. 

Pathology, then, has its limits. No matter how clever the mathemati- 
cian, certain functions remain beyond the pale, a fact Volterra demonstrated 
with this clever argument. But he had one more corollary up his sleeve, 
that there can be no continuous function taking rationals to irrationals 
and vice versa [9]. 


Corollary: There does not exist a continuous function g defined on the 
real numbers such that g(x) is rational when x is irrational and g(x) is 
irrational when x is rational. 
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Proof: Again, for the sake of contradiction, Volterra assumed such a 
function g exists. We then define G by G(x) = R(g(x)), where R is 
the extended ruler function from above, and make two claims 
about G: 


Claim 1: Ifx is rational, G is continuous at x. 


This is evident because, if x is rational, g(x) is irrational, so R is con- 
tinuous at g(x). But g is assumed to be continuous everywhere, so the 
composite function G will be continuous at x. 


Claim 2: If y is irrational, then G is discontinuous at y. 


This is easily verified by choosing a sequence {x,} of rationals con- 
verging to y. Then 


lim G(x) = lim R( g(x, )) = lim 0 = 0, 
k—e0 k—eo k—-00 


because g carries each rational x, to an irrational g(x,), and the ruler 
function is zero at irrational points. On the other hand, G(y) = 
R(g(y)) # 0 because g(y) is rational. In short, lim G(x,) # G(y), and 
so G is discontinuous at y. i 

Taken together, these claims show that G is continuous upon the 
rationals and discontinuous upon the irrationals—a situation that 
Volterra had just proved to be impossible! It follows that a function 
like g cannot exist. There is no continuous transformation that carries 
rationals to irrationals and vice versa. Q.E.D. 


Among other things, these results remind us that the rationals and 
irrationals, although both dense sets of real numbers, are intrinsically 
noninterchangeable. As we saw, Cantor had highlighted the fact that the 
rationals are denumerable and the irrationals are not, but mathematicians 
would find other, more subtle distinctions between these systems. One of 
these was the notion of a set’s “category,” a concept due to Volterra’s gifted 
student René Baire, who is the subject of our next chapter. 

With this, we leave the 21-year-old Vito Volterra. A long and distin- 
guished career lay ahead of him, one that would see continued mathematical 
success, international recognition, and even an honorary knighthood from 
Britains King George V. 

Looking back from later in his life, Volterra characterized the 1800s as 
“the century of the theory of functions” [10]. Starting with Euler’ initial 
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ideas, the concept of function had assumed a central role in the work of 
Cauchy, Riemann, and Weierstrass and then been passed to the generation 
of Cantor, Hankel, and Volterra himself. Functions had come to dominate 
analysis, and their unexpected possibilities surprised mathematicians time 
and again. As we have seen, Volterra deserves a place in this tale for two 
different but fascinating discoveries from 1881. 

For such a young man, it had been quite a year. 


CHAPTER 13 


t 


René Baire 


n his doctoral thesis of 1899, René Baire (1874-1932) assessed the 
importance of set theory to mathematical analysis: 


One can even say, in a general manner, that .. . any problem rela- 
tive to the theory of functions leads to certain questions relative to 
the theory of sets and, insofar as these latter questions are or can 
be addressed, it is possible to resolve, more or less completely, the 
given problem [1]. 


As we shall see, Baire not only advocated this position but did a splendid 
job of practicing it. 

Unfortunately, his mathematical triumphs were confined to the brief 
periods when he was both physically and mentally sound. An introverted 
person of “delicate” health, Baire entered university in 1892, and his obvi- 
ous talents took him to Italy to study with Volterra [2]. After completing 
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his dissertation, Sur les fonctions de variables réelles, Baire taught at the Uni- 
versities of Montpellier (1902) and Dijon (1905). During this time, de- 
spite the occasional setback, Baire seemed able to cope. 

But then a series of ailments destroyed his fragile constitution. He en- 
dured everything from restrictions of the esophagus to severe attacks of 
agoraphobia. By 1909 his teaching had deteriorated beyond repair, and in 
1914 he was given a leave of absence from Dijon. Baire would never re- 
turn to serious research. 

Instead, he spent his remaining years fighting physical and mental 
demons while burdened with sometimes crushing poverty. A colleague 
described him as “the type of man of genius who pays for that genius with 
a continual suffering due to an always unsteady constitution” [3]. In all, 
René Baire had only a dozen good years to devote to mathematics. 

In this chapter, we shall look back to his dissertation and the first ap- 
pearance of what is now known as the Baire category theorem. We begin, 
as did Baire, with the concept of a nowhere-dense set. 


NoOwHERE-DENSE SETS 


As noted earlier, a set of real numbers is dense if every open interval 
contains at least one member of the set. In modern notation, D is dense if, 
for any open interval (a, 6), we have (a, 8B) ND# ©. 

A set fails to be dense if there is an open interval containing no points 
of the set. For instance, we let E be the set of all positive rational numbers. 
This is not dense in the real line because the open interval (2, 0) is free of 
points of E. However, E exhibits a “denseness” over part of its reach, for 
members of E are present in any open interval (a, B) where 0 < a< B. 

In order to move beyond examples like this, that is, those that are 
dense in some regions but not in others, we introduce a new idea. 


Definition: A set P of real numbers is nowhere dense if every open inter- 
val (a, 8) contains an open subinterval (a, b) C (a, B) such that 
(a,b) 0\P=©. 


This means that, even though points of P might be found in a given inter- 
val (a, 8), there is an entire subinterval within it that is free of such points 
(see figure 13.1). Nowhere-dense sets are thus regarded as being sparse or, 
to use the descriptive term of Hermann Hankel, “scattered” [4]. 
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no points of P 
points of P in (a,b) 
oe | p 


pe 


a b 


Figure 13.1 


We note that “nowhere dense” is not the logical negation of “dense.” 
The nondense set E above, for instance, is not nowhere dense because 
the open interval (3, 4) contains no subinterval free of positive rationals. 
We thus would do well to provide a few examples of sets that are 
nowhere dense. 


1. The set consisting of a single point {c} is nowhere dense. 
This is obvious, for if (@, B) is an open interval not containing c, then 
(a, B) < (a, B) and (a, B) A {c} = ©. On the other hand, if (a, 8) is an 
open interval containing c, then (c, 8B) C (a, B) and (c, B) A {c} = ©. 


2. The set S= fa is a whole number = eras bss } is no- 
where dense. 

This too is easy to see, for the gaps between reciprocals of two con- 

secutive integers will furnish subintervals free of points of S. Even if a 

given open interval (a, 8) contains 0O—the point towards which these 

reciprocals are accumulating—we can choose a whole number N so 


1 1 1 
that — € (a, d take th binterval | ——— ,— | c (a, B) with 
a . (a, B) and take the open subinterva [ - (a, B) 


eee AS =, as shown in figure 13.2. 
N+1N 


a ee a 2 1 
no points of P 
in (1/N+1),1/N) 


Figure 13.2 
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1 1 
3. ThesetT = {? + k Ir andk are whole numbers} is nowhere dense. 
r 


To conjure up a mental picture of this set, fix r and let k run through the 
1 11 116i 


positive integers. This generates points — + 1,-+ —,-+=,-+—,...,which 
r r 2r 37r 4 
lis ; 
cluster around — in the same way that the points of the previous example 
r 


1 

clustered around 0. Because r is arbitrary, every reciprocal — is such a cluster 
r 

point, giving T quite a complicated structure. Nonetheless, the gaps among 


— lil 
the points — + k are such as to make T nowhere dense (we omit the details). 
r 


Before seeing what Baire made of this, we prove two simple lemmas 
that will come in handy. 


Lemma 1: Subsets of nowhere-dense sets are nowhere dense. That is, if P 
is anowhere-dense set and U Cc P, then U is nowhere dense. 


Proof: Given an open interval (@, 8), we know there exists an open 
subinterval (a,b) c (a, B) with (a,b) A P=@. Because U is a sub- 
set of P, it is clear that (a, b) ~ U=@, and so U is nowhere dense as 
well. Q.E.D. 


Lemma 2: The union of two nowhere-dense sets is nowhere dense. 


Proof: Let P,; and P, be nowhere dense. To show that P, U P, is also 
nowhere dense, we begin with an open interval (a, 8). Because P, is 
nowhere dense, there exists an open subinterval (a, b) C (@, B) with 
(a, b) MP, = ©. But (a, b) is itself an open interval and P, is nowhere 
dense, so there is an open subinterval (c, d) € (a, b) C (a, B) with 
(c, d) 0 P, =@. Clearly, (c, d) is an open subinterval of (@, 8B) con- 
taining no points of P, or P,. Thus, (c, d) MN (P; U P3) = ©, so P; UP, 
is nowhere dense. Q.E.D. 


As this second lemma shows, we can amalgamate two—or for that 
matter any finite number—of nowhere-dense sets and still find ourselves 
with a nowhere-dense outcome. Even the union of a million such sets 
would remain, in Hankel’s terminology, scattered. 
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But what if we assemble an infinitude of nowhere-dense sets? What 
sort of structure might such a union have? And what use might this be to 
mathematical analysis? These are matters that Baire addressed with his 
characteristic ingenuity. 


THE BAIRE CATEGORY THEOREM 


In his thesis, Baire wrote of a set F with the property that 


there exists a denumerable infinity of sets P,, P5, P3, P4,..., each 
nowhere dense, such that every point [of F] belongs to at least one 
of the sets P,, P>, P3, P4,.. . . | will say a set of this nature is of the 
first category. [5] 


In other words, F is a set of the first category if F=P, UP, UP,U---U 
P,,U--++, where each P, is nowhere dense. 

Many later mathematicians have been critical of Baire not for his ideas 
but for his terminology. The completely nondescriptive “first category” is 
about as colorless a term as there is and conjures up no image whatever in 
the mind’s eye. Such critics must have been further dismayed when they 
read on: “Any set which does not possess this property [first category] will 
be said to be of the second category.” 

It is clear that a denumerable set is of the first category. Such a set 
{d,, Gy, 43, dy, . . .} can be written as the union of one-point sets 


Laas Waa) We dat Wy es, 


where, as we saw, each one-point set is nowhere dense. In particular, this 
means that the (denumerable) set of algebraic numbers is of the first cate- 
gory, as is its (denumerable) subset, the rationals. But the rationals form a 
dense set. So, whereas finite unions of nowhere-dense sets must remain 
nowhere dense, denumerable unions of such sets can grow sufficiently 
large to be everywhere dense. As Baire put it, a first category set “can evi- 
dently be of a different nature than the individual sets P,” [6]. If we agree 
that nowhere-dense sets are “small,” are we ready to conclude that first 
category sets are, for want of a better word, “large”? 

Before seeing what Baire had to say about this, we need a few more 
lemmas. 
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Lemma 3: Any subset of a first category set is itself of the first category. 


Proof: Let F=P, UP, UP3;U--:UP,U:-:- be of the first category, 
where each P,, is nowhere dense, and let G € F. Elementary set theory 
shows that 


G=GONFH=(GOP,)UGAP,)U::-U(GOP,) U-::, 


where each GM P, is a subset of P,, and so is nowhere dense by lemma 1. 
Because G is then a denumerable union of nowhere-dense sets, it is of the 
first category. Q.E.D. 


We remark that lemma 3 implies that if S is a set of the second cate- 
gory and S CT, then T must also be of the second category. Just as shrink- 
ing a first category set yields another of that category, so too does enlarging 
a second category set result in another second category set. 


Lemma 4: The union of two first category sets is first category. 


Proof: Let F and H be of the first category. Then F= P, UP; UP3U---U 
P,,U-+-, where each P,, is nowhere dense, and H=R, UR, U-:-U 
R, U-+::, where each R, is nowhere dense. We shuffle these sets to- 


gether to write 
FUH=(P, UR) U(P, UR) U---U(P, UR) Us, 


and each set P;, U R,, is nowhere dense by lemma 2. Thus, F U His the 
denumerable union of nowhere-dense sets and so is of the first cate- 


gory. Q.E.D. 


Lemma 4 rests upon the fact that the union of two denumerable col- 
lections is denumerable, and we can extend this to three or four or any 
finite number of such collections. Better yet, the denumerable union of 
denumerable collections is denumerable, so we have the following 
lemma. 


Lemma 5: If F,, F,,..., F,,...is a denumerable collection of sets of the 
first category, then their union F; U F, U--- UF, U--- is of the first 
category as well. 
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As noted, the dense set of rationals is of the first category, suggesting 
that sets of this type may be “large.” But appearances are deceptive. In 
1899 Baire proved that a first category set must be, in a fundamental 
sense, “small.” To be precise, such a set is never sufficient to exhaust an 
open interval. It is this result that now carries his name. 


Theorem (Baire category theorem): If F=P, UP,;UP,;U---UP,U>>:, 
where each P,, is a nowhere-dense set, and if (@, 8) is an open inter- 
val, then there exists a point in (a, 6) that is not in F. 


Proof: We begin with (a, 8) and consider the nowhere-dense set P,. By 
definition, there is an open subinterval of (a, 6) containing no points 
of P,. By shrinking this subinterval if necessary, we can find a, <b, such 
that the closed subinterval [a,, b,] c (a@, B) and [a,, b|] AP; =O. (We 
remark that Baire, like Cantor and Volterra before him, did not em- 
phasize the need for closed subintervals.) 

But (a, b,) is itself an open interval and P, is nowhere dense, 
so in analogous fashion we have a, <b, with [a,, bj] C(a,,b,) c 
[a,, b,] C (a, B) and [a,, b,] AP; =O. Continuing in this way, we 
construct a descending sequence of closed intervals 


[d,, by] ~ la, b5| = ae) lay, b;,| = ae 


where [a,, b,] © P;, = for each k = 1. 

By the nested interval version of the completeness property, there 
is at least one point c common to all of these intervals. To complete the 
proof, we need only show that c is a point of the open interval (a, B) 
not belonging to F 

First, because c is in all the closed intervals, c € [a,, b,] Cc (a, B), 
and so c indeed lies within (a, B). 

Second, for each k 21, we know that c is in [a,, b,] and that 
[a,, b,| has no points in common with P,. The point c, belonging to 
none of the P,,, cannot belong to their union, F. 

We have thus found a point of (a, 8) not contained in the first 
category set F. In short, a first category set cannot exhaust an open 
interval. Q.E.D. 
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Je commence par démontrer la proposition suivante: Si P est un en- 
semble de premiére catégorie, il existe, dans toute portion «@ du segment 
sur lequel il est défini, au moins un point (et par suite une infinité) n’appar- 
tenant pas & P. En effet, d’aprés les hypothéses, on peut déterminer dans 
a8 un intervalle fini «, 6, ne contenant aucun point de P,; dans «,6,, un 


intervalle o, 8; ne contenant aucun point de P,, etc....3 dans on—1@n—1, Un 
intervalle a, 6, ne contenant aucun point des » premiers ensembles P,, P,,... 
Py; il existe au moins un point M compris & Vintérieur de tous les segments 
an Ba; ce point M ne fait partie d’aucun ensemble P, et par suite ne fait 
pas partie de P. 


The Baire category theorem (1899) 


This is the original proof of the Baire category theorem. His elegant ar- 
gument used the completeness property and did so in a manner reminis- 
cent of the result we have seen from his mentor Volterra. Baire continued: 


It follows immediately that any interval is a set of the second cat- 
egory; for we have just proved that one cannot obtain all points of 
a continuous interval by means of a denumerable infinity of 
nowhere dense sets [7]. 


From this we can deduce that the set of all real numbers is of the sec- 
ond category, for the reals contain within them the second category set 
(0, 1). And this means that the set of irrationals is of the second category, 
for otherwise, both the rationals and the irrationals would be of the first 
category, as would be their union by Lemma 3. But their union is all the 
real numbers, a second category set. 

At this point, Baire contrasted sets of the first and second categories: 


One sees the profound difference that exists between sets of the 
two categories; this difference does not reside in their denumer- 
ability nor in their condensation within an interval, for a set of the 
first category can have the cardinality of the continuum and can 
be dense; but it is in some sense a combination of the two preced- 
ing notions [8]. 


From what is now called the topological viewpoint, the Baire category the- 
orem shows that first category sets are in a sense negligible. Some authors 
who object to Baire’s colorless terminology use meager as a more sugges- 
tive alternative for “first category.” Whatever their names, Baire’s dichotomy 


BAIRE 19 | 


would have important consequences for mathematical analysis, as the next 
section illustrates. 


SOME APPLICATIONS 


A hallmark of mathematical progress is the fruitful generalization, one 
that gathers seemingly unrelated matters under a single umbrella. Such a 
generalization is both more efficient and more elegant than what came be- 
fore. The Baire category theorem is one of these, as is clear if we return to 
Cantor’s nondenumerability result from chapter 11. 


Cantor’s Theorem Revisited: If {x;,} is a sequence of distinct real num- 
bers, then any open interval (a, 8) contains a point not included 
among the {x;}. 


Proof: The collection {x,, x3, x3,...,X,,...}, considered as a set of points, 
is denumerable and thus of the first category. Because Baire showed 
that a first category set cannot exhaust an open interval, (a, 8) must 
contain a point other than the {x;}. Q.E.D. 


That was certainly easy. 

But there is more. Volterra’ major result from chapter 12 is also a con- 
sequence of Baire’s work. To see this, we need some background, includ- 
ing an immediate corollary of the category theorem. 


Corollary: The complement of a first category set is dense. 


Proof: (Recall that the complement of a set of real numbers A, often de- 
noted by A‘, is the set of real numbers not belonging to A.) Let F be 
of the first category and consider any open interval (@, 8). Baire proved 
that not every point in (@, B) belongs to F, so (a, B) AN F°# ©, and 
this is precisely what it required to show that the complement of F is 
dense. Q.E.D. 


We next wish to characterize pointwise discontinuous functions in 
terms of category, a quest that had led Baire to investigate category in the 
first place. In what follows, we join Baire in adopting the “inclusive” 
meaning of pointwise discontinuity, that is, continuity on a dense set. But 
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our discussion differs from his original in that he employed the function's 
oscillation, whereas we reach the same end by means of sequences [9]. 
Beginning with a function f and a whole number k, we define the set 


P, = {xlthee is a sequence a, > x with| f(a,;) — fO)|2 7 for all j = i 


(1) 


A real number x thus belongs to P, if we can approach x sequentially by 
means of {a;} in such a way that the functional values f(a j and f(x) are all 
separated by a gap of at least 1/k. As an example, we again consider the 
function 


_ Jeos(/x) if x #0, 
st) =| 0 ifx=0 


from chapter 10 and claim that 0 belongs to the set P;. To verify this, we 


1 1 
introduce the sequence }=—- Clearly lim — = 0, and for each j 2 1 
joo ltj 


2) |) 
1 
ls) 


we see that 0 € P5. 
We are now ready to prove Baire’s characterization of pointwise dis- 
continuity in terms of the “smallness” of Dy. 


we have 


il 
=|cos(2zj) -O| =12 x By the definition in (1), 


Theorem: f is (at worst) pointwise discontinuous if and only if Dy is a set 
of the first category. 


There are, of course, two implications to be proved. We begin with the 
more intricate necessary condition. 


Necessity: If fis (at worst) pointwise discontinuous, then Dyis of the first 
category. 


Proof: Our first object is to show that each P, as defined above is nowhere 
dense. We thus fix a whole number k = 1 and an open interval (a, 8). By 
pointwise discontinuity, fis continuous at some point—call it x»—within 
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: 1 
(a, B). This means that lim f(x) = f(%o), and so, for ¢ = Sh’ there 
XXq 


exists a 6 > 0 such that the open interval (x, — 6, x) + 6) is a subset of 


(a, B) and 
if |x — x91 < 6, then Lf) = fp) <=. (2) 


We assert that (x) — 6, X) + 6) AP, = ©. To prove this, suppose 
the opposite. Then there is some point z belonging to (xg — 6, 
Xo + 6) P,. By the nature of P, there must be a sequence a; — z 


1 
with |f(a,)- fol " for all j = 1. Because the sequence {a} con- 


verges to z €(Xp — 6, X) + 6), there exists a subscript N so that ay € 
(Xp — 0, Xp + 6). With some help from the triangle inequality, we con- 
clude that 


oi fl@x) — fO1= flay) — flee) + fle) — FOI 


1 1 2 
<| f(ay) - ~ - <—4+— = — 
I f Cay) — fxg +I fxg) — fC) ae ap oe 
where the last step follows from (2) and the fact that both |a@y —X9|< 6 
and |z—Xx9l <6. This chain of inequalities leaves us with the contra- 


1.2 
diction that ry < os Something is amiss. 


The trouble arose from the assumption that (xp) — 6, Xy + 6) A Py 
is nonempty. We conclude instead that (xg — 6, Xp + 6) is a subinterval 
of (a, B) that contains no points of P,. By definition P,, is nowhere 
dense for each k, and this in turn means that P;} UP; U-:-UP, U-:: 
is a set of the first category. 

We are nearly done. We need only apply the notion of continuity— 
or, more precisely, of discontinuity—to see that 


Dee Py WIP Ws ie Ps, (3) 


Expression (3) follows because if x € D, is any point of discontinuity 
of f, then there exists an € > 0 so that, for any 5> 0, we can find a point 
z with O<|z—x\|< 6 yet If(z) —f (x)| 2 €. We then choose a whole 


1 1 
number k with k < € and let 6 equal, in turn, Loa ... to generate 
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1 
points a), dy, 43,..., dj, .. _ with 0 < la, —xl<— but If G) -f{@| 
J 


1 
2E> The sequence {a} converges to x, yet for all j = 1, we have 


1 
fla) — foodl> " By the definition in (1), the discontinuity point x be- 


longs to the nowhere-dense set P,, and so, indeed, Dc P} UP, U +++ U 
P. Wess, 

We wrap up this half of the proof by noting that D,, a subset of the 
first category set P)} UP, U---UP, U--., is itself first category by 
lemma 3. Therefore, if f is pointwise discontinuous, then Dyis a set of 
the first category. 


Sufficiency: If D;is of the first category, then f is (at worst) pointwise dis- 
continuous. 


Proof: This is an immediate consequence of the corollary to the Baire 
category theorem that we introduced earlier. Because D, is of the first 


category, its complement is dense. In other words, Di =C, = {xl f 


is continuous at x} is dense, which is precisely what is required for f to 
be at worst pointwise discontinuous. Q.E.D. 


Thus the pointwise discontinuous functions are those whose assem- 
bled discontinuities remain “small” in the sense of being of the first cate- 
gory. This characterization reduced Hankel’s thirty-year-old notion of 
pointwise discontinuity to a simple condition on the set D,. Besides having 
its own intrinsic value, it allowed Baire to give an elegant proof of 
Volterra’s theorem from the previous chapter [10]. 


Volterra’s Theorem Revisited: There do not exist two pointwise dis- 
continuous functions on the interval (a, b) for which the continuity 
points of one are the discontinuity points of the other, and vice 
versa. 


Proof: Suppose for the sake of argument that f and @ were two such func- 
tions. The previous theorem shows that both D, and D, are of the first 
category and so too is Dy U D, by lemma 4. By the Baire category the- 
orem, the complement of this union is dense. But the complement in 
question is the set of points at which neither function is discontinuous, 
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that is, the set of their common points of continuity. We have reached 
a contradiction, for f and @ share not just a single point of continuity 
but a dense set of them. Q.E.D. 


And, with little additional effort, Baire provided the following dra- 
matic extension [11]. 


Theorem: If f,, f6,...,f,,...is a sequence of (at worst) pointwise dis- 
continuous functions defined on a common interval, then there is a 
point—indeed, a dense set of points—at which all of these are simul- 
taneously continuous. 


Proof: As in the preceding proof, we consider D,,, the set of discontinuity 
points of the function f,,. By pointwise discontinuity, each of these is of 
the first category, and so their union Dy, U Dy, U- ++ U Dy, U «++ is of 
the first category by Lemma 5. Again, the complement of this union is 
dense, but this complement is Cp, VW Cp, +++ Cp, O- + the points 
where all the functions are continuous at once. Q.E.D. 


This theorem shows that even though pointwise discontinuous func- 
tions can have infinitely many discontinuities, and even though we assem- 
ble a denumerable infinitude of such functions, enough continuity remains 
to guarantee that they share a dense set of points where all are continuous. 
This represents a perfect fusion of set theory and analysis, blended together 
under the watchful eye of René Baire. 

Before leaving this section, we mention a last consequence Baire drew 
from his great theorem, one that led him to another lasting innovation 
[12]. 


Theorem: The uniform limit of pointwise discontinuous functions is 
pointwise discontinuous. 


Here he began with a sequence f), f>,..-.,f,,-. . of pointwise discon- 
tinuous functions defined on a common interval and assumed they con- 
verged uniformly to a function f. As we have seen, uniform convergence as 
described by Weierstrass was sufficiently strong to transfer certain proper- 
ties from individual functions to their limit. Baire established that “point- 
wise discontinuity” was one such property. 

Although omitting details, we give a sense of his argument. Under 
uniform convergence, Baire showed that any common point of continuity 
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of the individual functions f, must be a point of continuity of the limit 
function f. To put this in set-theoretic notation, he proved 


Sp Og OV NGG Ge 


As we just saw, Baire knew that this denumerable intersection was dense, 
and so C, must be dense as well. Then the uniform limit f, being continu- 
ous on a dense set, was pointwise discontinuous as claimed. 

The fact that uniform limits of pointwise discontinuous functions must 
be pointwise discontinuous led Baire to wonder what, if anything, could be 
said about nonuniform limits. His reflections produced a new taxonomy of 
functions, much more sophisticated than Hankel’s from a quarter-century 
earlier. We end the chapter with a discussion of these ideas. 


THE BAIRE CLASSIFICATION OF FUNCTIONS 


In the hope of categorizing functions into logically meaningful classes, 
Baire, like Hankel, took the continuous ones as his starting point. “I choose 
to say that the continuous functions constitute class 0,” he wrote, in the 
process solidifying his reputation for colorless terminology [13]. 

Suppose we have a sequence of continuous, that is, class 0, functions 
{f,t, and let f(x) = lim f(x) be their pointwise limit. As we saw, f may or 


may not be continuous. In the latter case, the limit function has escaped 
from class 0, so Baire was ready with a new class. “Those discontinuous 
functions that are limits of continuous functions,” he wrote, “form class 1.” 
As an example, we recall from chapter 9 that each function f, (x) = (sin x)" 
is continuous on [0, z], but f(x) = lim f,(x) is discontinuous at 2/2. So, 
f belongs to class 1. = 

Baire proved something far more interesting: that functions in class 1 
are at worst pointwise discontinuous [14]. That is, when we take a limit of 
continuous functions, the outcome need not be continuous everywhere 
but must at least be continuous on a dense set. Taking limits of continu- 
ous functions, then, cannot obliterate all vestiges of continuity. On the 
contrary, such limits retain a “respectable” amount of continuity from the 
originals. For those seeking a permanence in analysis, there is some com- 
fort in that conclusion. 

One consequence is the following. 


Theorem: If f is differentiable, then its derivative f’ must be continuous 
on a dense set. 
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f(x+1/k)- f&) ae 
V/k 


differentiability of f implies its continuity, so each f, is continuous as 
ml& +1/k)— f(x): 

ee V/k 

as k > ©, ae f’ is the pointwise limit of a sequence of functions 

from class 0 and thus belongs to class 0 (in which case it is continu- 

ous) or to class 1 Gin which case it is pointwise discontinuous). Either 

way, derivatives must be continuous on a dense set. Q.E.D, 


Proof: For eachk > 1, we define a function f, (x) = 


well. But lim n fi (x)= = f’(x) because 1/k > 0 


We have previously seen that a differentiable function may have a dis- 
continuous derivative, but we can now answer the big question, “How 
discontinuous can a derivative really be?” Thanks to Baire, the answer is, 
“Not very, for it must be continuous on a dense set.” 

Meanwhile, he continued his classification scheme: 


Now suppose one has a sequence of functions belonging to 
classes 0 or 1 and having a limit function not belonging to either 
of these two classes. I will say that this limit function is of the sec- 
ond class, and the set of all functions that can be obtained in this 
manner will form class 2 [15]. 


To establish that there is something in class 2, we define a function 


D(x) = im] lim (cost) J 


joe 


and claim that, all appearances to the contrary, this is nothing but Dirich- 
let’s function, 


OQ if xisirrational. 


ioe i if x is rational, 


We should take a moment to verify this claim. Note first that if x = p/q 
is a rational in lowest terms, then for any k 2q, the expression k! 2x = 


kit| 2 | is an integer multiple of z. Thus, for each k after a certain point, 


q 


lim(cosk!2x)*/ = lim(+1)*/=1, and so D(x) = tin) lincosktnx) |= il 


jReo jos kseo]_ joo 


as well. On the other hand, if x is irrational, then k! zx cannot be an integer 
multiple of , and it follows that |cos k!zx| < 1. Consequently, for each k, 
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lim(cosk!2x)*i =0 and so D(x) = tim] ti cos kx) = lim0=0. 
joe ke joo kco 


Because D equals 1 at each rational and 0 at each irrational, it is indeed 
Dirichlet’s function traveling incognito. 

What makes this intriguing is the analytic nature of D. When it was 
introduced early in the nineteenth century, Dirichlet’s function seemed so 
pathological as to lie beyond the frontier of analysis. Yet here we see it as 
nothing worse than the double limit of some well-behaved cosines. 

Moreover, for each k and j, the function (cos k!2x)7/ is continuous, so 
Dirichlet’s function is seen to be the pointwise limit of the pointwise limits 
of continuous functions. This places it in class 0, class 1, or class 2. But we 
know that d is discontinuous everywhere and so does not belong to class 
0 (which requires continuity) nor to class 1 (which requires continuity on 
a dense set). The only alternative is that Dirichlet’s function resides in 
Baire’s second class. 

Baire was just getting warmed up. A function that is the pointwise 
limit of those from classes 0, 1, and 2 but does not belong to any of these 
classes is said to be in class 3. A limit of functions from classes 0, 1, 2, or 
3 that escapes these will be in class 4. And on it goes. In the end we have 
an unimaginably vast tower of functions, beginning with continuous ones 
and evolving via repeated limits into ever more peculiar entities. 

Needless to say, Baire’s classification raised a host of questions. For in- 
stance, how can we be sure there are any functions in class 247? And are 
there functions so bizarre as to belong to no Baire class at all? It was Baire’s 
contemporary, Henri Lebesgue, who proved that the answer to both of 
these questions is a resounding “yes” [16]. 

Although ill health brought his career to an abrupt end, René Baire 
carved out a share of mathematical immortality. He introduced the di- 
chotomy between first and second category sets, proved and exploited his 
powerful category theorem, and provided a classification of functions that 
seemed to extend the boundaries of analysis to the far horizon. 

As historian Thomas Hawkins observed, Baire’s remarkable discover- 
ies showed that, even at the threshold of the twentieth century, the calcu- 
lus was still generating wonderful new problems [17]. In this regard, 
Lebesgue wrote of Baire’s “rich imagination and solid critical sense” and 
continued, 


Baire showed us how to investigate these matters; which problems 
to pose, which notions to introduce. He taught us to consider the 
world of functions and to discern there the true analogies, the 
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genuine differences. In absorbing the observations that Baire made, 
one becomes a keen observer, learning to analyse commonplace 
ideas and to reduce them to notions more hidden, more subtle, but 
also more effective. 


In the end, Lebesgue called Baire “a mathematician of the highest class,” 
an impressive testimonial from one great analyst to another [18]. 

We conclude by returning to the chapter's opening passage: “Any 
problem relative to the theory of functions leads to certain questions rela- 
tive to the theory of sets.” As we have seen, Baire lived by this motto. In- 
sofar as modern analysis has embraced his position, he deserves a large 
debt of gratitude. 
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As the nineteenth century became the twentieth, mathematicians 
had reason to congratulate themselves. The calculus had been around for 
over two centuries. Its foundations were no longer suspect, and many of 
its open questions had been resolved. Analysis had come a long way since 
the early days of Newton and Leibniz. 

Then Henri Lebesgue (1875-1941) entered the picture. He was a bril- 
liant doctoral student at the Sorbonne when, in 1902, he revolutionized 
integration theory and, by extension, real analysis itself. He did so with a 
dissertation that has been described as “one of the finest which any math- 
ematician has ever written” [1]. 

To get a sense of his achievement, we conduct a quick review of Rie- 
mann’s integral before examining Lebesgue’s ingenious alternative. 
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RIEMANN REDUX 


In previous chapters we have highlighted certain “flaws” in the Rie- 
mann integral. Some statements that mathematicians had expected to be 
true required additional hypotheses to render them valid. Both the funda- 
mental theorem of calculus and the interchange of limits and integrals 
were false without assumptions that seemed overly restrictive. 

For this latter situation, our counterexample from chapter 9 involved 
a sequence of functions spiking ever higher. One might argue that the 
limit/integral interchange failed in that situation because the functions 
were not uniformly bounded. But the flaw runs deeper, as is evident from 
the following example. 

Begin with the set of the rational numbers in [0, 1], which we shall 
denote by Q,. Their denumerability allows us to list them as Q, = {r, 1, 
r3, 14, .. J. We then define a sequence of functions 


a a oe Cee 
O otherwise. 


o,(X) = 


Here, @; takes the value 1 at each of the first k rationals from the list and 
takes the value 0 elsewhere. Each such function is bounded with |@,(x)| < 1, 
and each, equaling zero except at finitely many points, is integrable 


1 
with [ ,,(x)dx = 0. 
But what about lim @,(x)? Because any rational number x lies some- 
k—oo 
where on the list, ,(x) will eventually assume, and then maintain, a value 


of 1 ask > ©, And, if x is irrational, ¢,(x) = 0 for all k. In other words, 


1 if xis rational, 
QO if xis irrational. 


lim 6, (x) = ) 


What we have, of course, is Dirichlet’s function, and so, although each @; is 
integrable, their pointwise limit is not. The nonintegrability of Dirichlet’s 


1 1 
function shows that, by default, lim | o,Coddx# | im 0,00) fer This 
koo 40 0 | koo 


means that our problem with interchanging limits and integrals can- 
not be explained away by the unboundedness present in the example of 
chapter 9. 
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Even as these issues were being considered, there remained the ques- 
tion of how to characterize Riemann integrability in terms of discontinuity. 
In the notation of the previous chapter, mathematicians hoped to finish 
this sentence: 


A bounded function f is Riemann integrable on [a, b] 
if and only if Dy is (2) 


Everyone believed that the blank would be filled by some kind of “small- 
ness” condition on Dy, the set of points of discontinuity. It was evident that 
this missing condition was not “finite” nor “denumerable” nor “first cate- 
gory,” but its identity remained uncertain. Whoever filled in the blank by 
connecting continuity and Riemann integrability would make a very big 
splash indeed. 

It was Lebesgue who settled all these scores. He did so by returning to 
the concepts of length and area, viewing them from a fresh perspective, 
and thereby providing an alternative definition of the integral. The story 
begins with what we now call “Lebesgue measure.” 


MEASURE ZERO 


In a 1904 monograph, Lecons sur l’intégration, that grew out of his dis- 
sertation, Lebesgue described his initial goal: “I wish first of all to attach to 
sets numbers that will be the analogues of their lengths” [2]. 

He started simply enough. The length of any of the four intervals 
la, bl, (a, b], la, b), and (a, b) is b—a. Ifa set is the union of two disjoint 
intervals, that is, if A = [a, b] U Ic, d] where b < c, then we naturally let the 
“length” of A be (b—a)+(d—c). In similar fashion, we could provide a 
length for any finite union of disjoint intervals. 

But Lebesgue had in mind considerably more complicated sets. For 
instance, how should we extend the concept of length to an infinite 

111 
22 
chapter 13? Or how would we measure the “length” of the set of irrational 
numbers contained in the unit interval [0, 1]? 

Mathematicians before Lebesgue had asked these questions. In the 
1880s, Axel Harnack (1851-1888) introduced what we now call the outer 
content of a bounded set [3]. Given such a set, he began by enclosing it 


set like S= i | that we proved to be nowhere dense in 
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within a covering of finitely many intervals and using the sum of their 
lengths as an approximation to the set’s outer content. For S above, we 


ight ider th Ge 9 Nol ly Se th f 
mg consider e cover Cc , 10°10 4 100 Fi e sum O 


whose lengths is zs + at + eee =0.9103. 
7 10 
We could refine this estimate by taking a different covering. For 
instance, suppose we cover S by the union of five subintervals 


S€(0,0.2001) U ©2499, 0.2501) (0.3332, 0.3334) v 
(0.4999, 0.5001) U (0.9999, 1.0001). 


Although this looks a bit strange, our strategy should be clear (see figure 
14.1). The left-most interval (0, 0.2001) contains all points of S except for 
1/4, 1/3, 1/2, and 1, and each of these has been surrounded by its own 
narrow interval. For this covering, the sum of the lengths is 0.2001 + 
0.0002 + 0.0002 + 0.0002 + 0.0002 = 0.20009, a much smaller number 
than our first value 0.9103. 

At this point, Harnack advanced a bold idea: cover a bounded set E by 
finitely many intervals in all possible ways, sum the lengths of the intervals 
in each covering, and define the outer content c,(E) to be the limit of such 
sums as the length of the widest interval goes to zero. 

There was much to recommend this definition. For instance, the outer 
content of a bounded interval turned out to be its length—exactly as 
one would hope. Likewise, the outer content of the single point {a} must 
be zero, because for any whole number k, we can cover the set {a} by the 


1 1 
single interval | a — =p + ;| of length 7 As k grows ever larger, this 


length tends downward toward zero and so c,({a}) = 0. Again, this is as 
expected. 

Harnack could also find the outer content of an infinite set like S. His 
approach is suggested by our second covering above. For any €>0, we 


--0.200144 
1 4 if, 
e 4 3 2 1 


Figure 14.1 
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note that the interval [o <) contains all but finitely many points of S, 


| : , and 1. We then include each of 


pereyg 


which we denote by J ; 
N N-1 


these N points in a tiny interval of width ae For example, we could 


ll oe¢é 1 

ae 
k 8N k_ 8N 
the sum of their lengths is 


place - within . Together these intervals cover S, and 


é E € € 3 
+N StS 6 < €, 
2 =] 2 + 4 


Because, for each € > 0, S lies within finitely many intervals of total length 
less than €, we conclude that c,(S) = 0. We have here an infinite, nowhere- 
dense set of zero outer content. 

But Harnack confronted a different situation with the set Q, of ratio- 
nals in [0, 1]: an infinite, dense set. He recognized that any covering of Q, 
by a finite number of intervals will of necessity cover all of [0, 1]. Hence 
c.(Q) = 1. That is, the outer content of all rationals in the unit interval is 
the same as the outer content of the unit interval itself. 

In some ways, this seemed to make sense, but in others it appeared 
problematic. For if we let I, be the set of irrationals in [0, 1], identical rea- 
soning shows that c,(I,) = 1 as well. Because the union of the disjoint sets 
Q, and I, is the entire interval [0, 1], we see that 


C(O; 1) =eAl0, = 1 yet oh) teal + l= 2. 


Apparently, we cannot decompose a set into disjoint subsets and sum their 
outer contents to get the outer content of the original. Such nonadditivity 
was an unwelcome feature of Harnack’s theory of content. 

The promise of extending the concept of length to nonintervals was 
sufficient to lead others to modify the definition so as to eliminate the 
attendant problems. Many mathematicians contributed to this discussion, 
but history credits Lebesgue with its final resolution. He defined a set to 
be of measure zero if it “can be enclosed in a finite or a denumerable infini- 
tude of intervals whose total length is as small as we wish” [4]. Thus a set 
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E is of measure zero, written m(E) = 0, if for any €>0, we can enclose 


EC(a,,b,)U@, b,)U---U@,b,) U..., where >, —a,)<«. The 
k=l 

innovation here is that Lebesgue, unlike Harnack, permitted coverings by a 

denumerable infinitude of intervals, and this made a world of difference. 

It is obvious from the definitions that any subset of a set of measure 
zero must itself be of measure zero. It is equally clear that a set with outer 
content zero has measure zero as well. Thus, single points and the set S$ 
above are of measure zero. But the converse fails—and fails spectacularly— 
as Lebesgue showed when he proved the following. 


Theorem: If a set E=E, VE,U---UE,U-:- is the denumerable 
union of sets of measure zero, then E is a set of measure zero also [5]. 


Proof: Let ¢€>0 be given. By hypothesis, we can enclose EF, in a 
denumerable collection of intervals of combined length less than 


€ ; 
i we can enclose E, in a denumerable collection of intervals of 


é 
combined length less than 5: and in general we enclose E, in a 


denumerable collection of intervals of length less than — The given 
2 


set E is then a subset of the union of all these intervals which, being the 
denumerable union of denumerable collections, is itself a denumer- 


able collection whose combined length is less than ae ae Céad 


Eg 
an 


€ 
= 5 < €. Because E has been enclosed in a denumerable 


collection of intervals having combined length less than the arbitrarily 
small number €, we see that E has measure zero. Q.E.D. 


It follows that any denumerable set is of measure zero, for such a set 
can be written as the (denumerable) union of its individual points. 
In particular, the set of rational numbers in [0, 1]—the dense set labeled 
Q, above—has measure zero. Because m(Q,)=0 but c,(Q,)=1, it is 
evident that zero outer content and zero measure are fundamentally 
different. 

A lesser mathematician might have retreated before the phenomenon 
of a dense set with measure zero. Dense sets, after all, were ubiquitous 
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enough to be present in any interval no matter how tiny. Harnack himself 
had started down this path twenty years earlier but had rejected the idea 
as being ridiculous [6]. Such a prospect seemed sufficiently paradoxical to 
convince him to stick with finite coverings. 

But Lebesgue was not deterred, and his approach proved its worth 
when he found the long-sought relationship between a function's integra- 
bility and its points of continuity. “How discontinuous can an integrable 
function be?” was the question. Here is the simple answer. 


Theorem: For a bounded function f to be Riemann integrable on [a, b], it 
is necessary and sufficient that the set of its points of discontinuity be 
of measure zero [7]. 


That is, he filled the critical blank in (2) with the condition m(D,) =0. In 
many books, this is called “Lebesgue’s theorem,” indicating that, among 
the large number of results he eventually proved, this one was especially 
significant. 

At the heart of Lebesgue’s argument, not surprisingly, lay the Riemann 
integrability condition, which can be recast as: f is Riemann integrable 
if and only if, for any €>0 and any o> 0, we can partition [a, b] into 
finitely many subintervals in such a way that those containing points 
where the oscillation of the function is greater than o (what we called the 
Type A subintervals) have combined length less than e. 

We observe that by the time of Lebesgue, the notion of a function's 
“oscillation” at a point had been made more precise than in Riemann’s day. 
For our purposes, we shall continue to think of it informally as the maxi- 
mum variability of the function in the vicinity of the point. In addition, it 
was known that a function is continuous at Xp if and only if its oscillation 
at Xp iS Zero. 

Lebesgue introduced G,(o) as the set of points in [a,b] where the 
function's oscillation is greater than or equal to o and showed that G,(o) 
is a closed, bounded set. Because C= {x|the oscillation at x is zero}, we 
know that 


D, = {x|the oscillation at xis greater than zero} 


-«nv6(3}u--va[t}u- (3) 


LEBESGUE 207 
The validity of equation (3) should be clear. On the one hand, at any point 


of discontinuity, the oscillation must be positive and hence exceed — for 


some whole number N. This means the discontinuity point belongs to 


G, ~ and consequently to the union on the right side of (3). Conversely, 


Bde th Soe h I de 
any point in this union must belong to some (3) and thus has a positive 


oscillation, making it a discontinuity point. 
With this background, we consider Lebesgue’s argument. 


Proof: First, assume the bounded function f is Riemann integrable on 
la, b]. For any whole number k, the integrability condition guarantees 


that the set of points where the oscillation is greater than can be 


enclosed in finitely many intervals whose combined length is as small 


il 
as we wish. Thus this set, as well as its subset a (2) has zero content, 


1 
and so a(t has measure zero. By theorem 1, the union 


1 1 
Gu a(}) Ur U al?) U--+ will then be of measure zero, 


which implies, by (3), that Dy is of measure zero also. This completes 
one direction of the proof. 
For the converse, assume that m(D;) = 0 and let both e>0 and 


o > 0. Choose a whole number k with . <o. Then the set of points 


1 
where the oscillation exceeds o is a subset of a+} which, in turn, 


al 
is a subset of Dy. Hence, a(;) is of measure zero and so can be 
enclosed in a denumerable collection of (open) intervals of total length 


1 
less than €. Because a(;) is closed and bounded, Lebesgue could 
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1 
apply the famous Heine—Borel theorem to conclude that (2) 


lay within a finite subcollection of these open intervals [8]. This finite 
subcollection obviously has total length less than € and covers not 


1 
only a(?) but the smaller set of points where the oscillation exceeds 


o. In short, the integrability condition is satisfied and f is Riemann 
integrable. OED. 


Later, Lebesgue defined a property to hold almost everywhere if the set 
of points where the property fails to hold is of measure zero. With this ter- 
minology, we rephrase Lebesgue’s theorem succinctly as follows: A bound- 
ed function on [a, b] is Riemann integrable if and only if it is continuous 
almost everywhere. 

We can use this characterization, for example, to give an instant proof 
of the integrability of the ruler function R on [0, 1]. As we demonstrated, 
Ris continuous except at the set of rational points whose measure is zero. 
This means that the ruler function is continuous almost everywhere and 
so is Riemann integrable. Case closed. 

Lebesgue’s theorem is a classic of mathematical analysis. In light of 
what was to come, there is a certain irony in the fact that the person who 
finally understood the Riemann integral was the one who would soon ren- 
der it obsolete: Henri Lebesgue. 


THE MEASURE OF SETS 


The notion of zero measure, for all of its importance, is applicable 
only for certain sets on the real line. As he continued his thesis, Lebesgue 
defined “measure” for a much larger collection of sets. The basic idea was 
borrowed from his countryman Emil Borel (1871-1956), but Lebesgue 
improved upon it (dare we say?) immeasurably. 

The approach has a familiar ring. For a set E € [a, b], Lebesgue wrote: 


We can enclose its points within a finite or denumerably infinite 
number of intervals; the measure of the set of points of these 
intervals is...the sum of their lengths; this sum is an upper 
bound for the measure of E. The set of all such sums has a small- 
est limit m,(E), the outer measure of E [9]. 
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Symbolically, this amounts to 


m,(E) = infy ¥ hy, — a, E & (ayy) VU (ay,b,) U (a5,b3) Uf, 
k=1 


where we have employed the infimum, or greatest lower bound, of the set 
in question. Again, the difference between outer measure and outer con- 
tent is that Lebesgue allowed for denumerably infinite coverings along 
with the finite ones. He observed at once that m,(E) <c,(E), for taking 
more coverings can only decrease their greatest lower bound. 

Next, he looked at the complement of E in [a, b] which we write as 
E¢= {x|x € [a, b] but x €E}. With the definition above, he found the 
outer measure of E¢ and then defined the inner measure of E as mE) = 
(b-—a) —m,{E). 

Rather than determine the inner measure of E by means of the outer 
measure of its complement, a modern treatment is likely to “fill” the set E 
from within by finite or denumerably infinite unions of intervals and then 
take the least upper bound, or supremum, of the sum of their lengths. That 


is, m(E) = sup Yb, — d,) |(a,,b,) U (@,,b3) U (a5,b3) U->> CE ¢. For 
k=l 

bounded sets, the two approaches are equivalent, but the second one 

applies equally well if E is unbounded. 

At this point, Lebesgue showed that “the inner measure is never greater 
than the outer measure,” that is, mE) < m,(E), and then stated the key def- 
inition: “Sets for which the inner and outer measures are equal are called 
measurable and their measure is the common value of m,(E) and m,(E)” [10]. 

The family of measurable sets is truly immense. It includes any inter- 
val, any open set, any closed set, and any set of measure zero, along with 
the set of rationals and the set of irrationals. In fact, for some time mathe- 
maticians were unable to find a set that was not measurable, that is, one 
for which m,(E) < m,(E). These were eventually constructed by means of 
the axiom of choice and turned out to be extremely complicated [11]. 

Lebesgue explored the consequences of his definitions, three of the 
most basic of which were: 


1. If Eis measurable, then m(E)= 0. 
2. The measure of an interval is its length. 
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3. IfE,, E,..., E,...isa finite or denumerably infinite collec- 
tion of pairwise disjoint measurable sets and if E=E, UE, U 
--- OE, U--- is their union, then E is measurable and m(E) = 
RCE) ME.) eee iE) eee 


This third condition is the additivity property that outer content lacked. 
With it, we can easily find the measure of the set of irrationals in [0, 1], 
which we called I, above. We note that [0, 1] = Q, U 1,, where the two sets 
on the right are disjoint and measurable. Thus, 1 = m[0, 1] =m(Q, U1,) = 
m(Q,)+m(U,) =0+m{,), and so m(I,) = 1. In terms of measure, the irra- 
tionals dominate [0, 1], whereas the rationals are insignificant. 

Among other things, Lebesgue measure provided a new dichotomy 
between “small” (measure zero) and “large” (positive measure). This took 
its place alongside the cardinality dichotomy (denumerable versus nonde- 
numerable) and the topological one (first category versus second category). 
In all three, the rationals qualify as small for they are of measure zero, 
denumerable, and of the first category, whereas the irrationals are large 
(being of positive measure), nondenumerable, and of the second category. 

To continue with this idea, we have seen that, for any of these 
dichotomies, subsets and denumerable unions of “small” sets are “small,” 
and we have proved that a denumerable set is both of the first category 
and of measure zero. However, other “large/small” connections do not 
hold. It is possible to find first category sets that are nondenumerable and 
of positive measure and to find measure zero sets that are nondenumer- 
able and of the second category [12]. Obviously, these concepts had carried 
mathematicians into some deep waters. 

In his dissertation, Lebesgue was not content to consider just meas- 
urable sets. He defined a measurable function in these words: “We say that 
a function f, bounded or not, is measurable if, for any a< B, the set 
{x| @ < f(x) < B} is measurable” [13]. The diagram in figure 14.2 gives a geo- 
metric sense of this definition. For @ < Balong the y-axis, we collect all points 
x in the domain whose functional values fall between o and . If this set is 
measurable for all choices of a and f, we say that f is a measurable function. 

Using properties of measurable sets, Lebesgue showed that f is a mea- 
surable function if and only if, for any a, the set {x| a < f(x)} is measurable. 
From this result it easily follows that Dirichlet’s function d is measurable, 
because there are only three possibilities for the set {xla < d(x)}: it is 
empty if a= 1; it is the set of rationals if 0 < a@< 1; and it is the set of all 
real numbers if ~@< 0. In each case, these are measurable sets, so d is a 
measurable function. 
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{xla < f(x) < B} 


Figure 14.2 


We have seen that Dirichlet’ function is neither pointwise discontinu- 
ous nor Riemann integrable. With its wild behavior, it is excluded from these 
two families of functions. But it is measurable. One begins to sense that, in 
introducing measurable functions, Lebesgue had cast his net very widely. 

He continued his line of reasoning by proving that, for a measurable 
function, each of the following is a measurable set: 


{x| f(x) = a}, {xlas f(x) < B}, {xla<f(x) s B}, 


(4) 
and {x|a< f(x) < p}. 


He also showed that sums and products of two measurable functions are 
measurable, implying that we cannot leave the world of measurable func- 
tions by adding or multiplying. “But,” wrote Lebesgue, “there is more.” 

Theorem: If {f,} is a sequence of measurable functions and f(x) = 


lim f, (x) is their pointwise limit, then f is measurable also [14]. 
—oo 
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This is remarkable, for it says that we cannot escape the world of measur- 
able functions even by taking pointwise limits. In (1) above we saw that 
this is not true of bounded, Riemann-integrable functions, and in earlier 
chapters we noted a similar deficiency for continuous functions or those 
of Baire class 1. In those situations, the family of functions was too 
restrictive to contain all of its pointwise limits. Measurable functions, by 
contrast, are strikingly inclusive. 

Lebesgue was quick to observe a fascinating consequence of these 
theorems. We can easily see that constant functions are measurable, as is 
the identity f(x) = x. By adding and multiplying, it follows that any poly- 
nomial is measurable. The Weierstrass approximation theorem (see 
chapter 9) guarantees that any continuous function on [a, b] is the uni- 
form limit of a sequence of polynomials, and so any continuous function 
is measurable by the theorem above. For the same reason, pointwise 
limits of continuous functions are measurable, but these are just the func- 
tions in Baire class 1. This means that derivatives of differentiable 
functions are measurable. And so too are functions of Baire class 2, such 
as Dirichlet’s function, for these are pointwise limits of functions in Baire 
class 1. This same reasoning reveals that any function of any Baire class 
is measurable. 

It is fair to say that any function ever considered prior to 1900 
belonged to the family of Lebesgue-measurable functions. It was a really, 
really big collection. 

In some sense, however, all of this is prologue. Using the ideas of mea- 
sure and measurable function, Lebesgue was ready to make his greatest 
contribution. 


THE LEBESGUE INTEGRAL 


Riemann’ integral of a bounded function f started with a partition of 
the domain [a, b] into tiny subintervals, built rectangles upon these subin- 
tervals whose heights were determined by the functional values, and finally 
let the width of the largest subinterval shrink to zero. By contrast, Lebesgue’s 
alternative was predicated upon an idea as simple as it was bold: partition 
not the function’s domain, but its range. 

To illustrate, we consider the bounded, measurable function f in 
figure 14.3. Lebesgue let 1<L be the infimum and supremum of f over 
[a, b]|—that is, the least upper and greatest lower bounds of the function- 
al values—so that [I, L] contained the range of the function. Then, for 
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y =F(x) 


Ey = {xl og s f(x) < Lat} 


Figure 14.3 


any € > 0, Lebesgue imagined a partition of the interval [I, L] by means of 
the points 


where the greatest gap between adjacent partition points was less than e. 

With such a partition along the y-axis, we form the “Lebesgue sum.” 
Like a Riemann sum, this will approximate the area under the curve 
with regions of known dimensions, although we can no longer be certain 
these regions are rectangular. Rather, we consider the subinterval [I,, l)..1) 
along the y-axis and look at the subset E, of [a,b] defined by E,= 
{x|], < f(x) <l,4,}. This is the portion of the x-axis indicated in figure 
14.3. Here, E,, is the union of three intervals, but its structure can be much 
more complicated depending on the function at hand. 

At the analogous stage in Riemann’s approach, we would construct a 
rectangle whose height was an approximation of the function’s value, 
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whose width was the length of the appropriate subinterval, and whose 
area was the product of these two. For Lebesgue, we use I, to approximate 
the value of the function on the set E,, but how do we determine “length” 
if E,, is not an interval? 

The answer, which should come as no surprise, is to use the measure 
of the set E, in this role. Upon multiplying height and “length,” we get 
|, m(E,,) as the counterpart of the area of one of Riemann’s thin rectangles. 


We sum these over all subintervals of the range to get a Lebesgue sum, 
n 


by |, -m(E,,), where for the last term of this series we let E,, = {x TO = Ly. 
k=0 

Finally, Lebesgue let e— 0 so that the maximum value of 1,,, —|, 
approaches zero as well. Should this limiting process lead to a unique 
value, we say that f is Lebesgue integrable over [a, b] and define 


n 


[[ food = lim] 1, + mE) | 


k=0 


We must address two issues before proceeding. First, it is clear that 
the sets Ey, E,, E5,..., E,_1, E, partition [a, b] into subsets, although 
not necessarily into subintervals. Second, our assumption that f is mea- 
surable implies, by (4), that each E, = {xll,<f@) <1,,,;} along with 
E,, = {x|f(@) =1,} is a measurable set, and so we may properly talk about 
m(E,). Everything is falling nicely into place. 

In a work written for a general audience, Lebesgue used an analogy to 
contrast Riemann’s approach and his own [15]. He imagined a shopkeep- 
er who, at day’s end, wishes to total the receipts. One option is for the 
merchant to “count coins and bills at random in the order in which they 
came to hand.” Such a merchant, whom Lebesgue called “unsystematic,” 
would add the money in the sequence in which it was collected: a dollar, 
a dime, a quarter, another dollar, another dime, and so on. This is like 
taking functional values as they are encountered while moving from 
left to right across the interval [a, b]. With Riemann’s integral, the process 
is “driven” by values in the domain, and values in the range fall where 
they may. 

But, Lebesgue continued, would it not be preferable for the merchant 
to ignore the order in which the money arrived and instead group it by 
denomination? For instance, it might turn out that there were in alla dozen 
dimes, thirty quarters, fifty dollars, and so on. The calculation of the day’s 
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receipts would then be simple: multiply the value of the currency (which 
corresponds to the functional value l,,.) by the number of pieces (which cor- 
responds to the measure of F,) and add them up. This time, as with 
Lebesgue’s integral, the process is driven by values in the range, and the 
sets E, that subdivide the domain fall where they may. 

Lebesgue conceded that for the finite quantities involved in running a 
business, the two approaches yield the same outcome. “But for us who 
must add an infinite number of indivisibles,” he wrote, “the difference 
between the two methods is of capital importance.” He emphasized this 
difference by observing that 


our constructive definition of the integral is quite analogous to 
that of Riemann; but whereas Riemann divided into small subin- 
tervals the interval of variation of x, it is the interval of variation of 
f(x) that we have subdivided [16]. 


To show that he was not chasing definitions pointlessly, Lebesgue 
proved a number of theorems about his new integral. We shall consider a 
few of these, albeit without proof. 


Theorem 1: If fis a bounded, Riemann-integrable function on [a, b], then 
fis Lebesgue integrable and the numerical value of [ f(x)dx is the 


same in either case. 


This is comforting, for it says that Lebesgue preserved the best of Rie- 
mann. 


Theorem 2: If f is a bounded, measurable function on [a, b], then its 
Lebesgue integral exists. 


Here we see the power of Lebesgue’s ideas, because the family of 
measurable functions is far more encompassing than the family of Rie- 
mann integrable ones (i.e., those continuous almost everywhere). To put 
it simply, Lebesgue could integrate more functions than Riemann. Theo- 
rems 1 and 2 show that Lebesgue had genuinely extended the previous 
theory. 

For example, we have seen that Dirichlet’s function is bounded and 


measurable on [0, 1]. Consequently, * d(xddx exists as a Lebesgue integral, 
q el g 8 


in spite of the fact that it is meaningless under Riemann’s theory. 
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Better yet, it is easy to calculate the value of this integral. We start with 
any partition of the range: 0=1,<1, <1, <---<l,=1. By the nature of 
Dirichlet’s function, 


Ep = {x|0 $ d(x) <1,} =], the set of irrationals in [0, 1], 
E, = {xll, = d@&) <1,4,} =O fork=1,2,...,n-1, 
E,, = {x|d(x) = 1} = Q,, the set of rationals in [0, 1]. 


For this arbitrary partition, the Lebesgue sum is 


Di ly m(E,) = O- mE) +1, - mE) +--+ +1, m(E, 1) + 1+ mE,) 
k=0 
= 0-m(I,) +1, -m(@)+---+1,_1-m@)+1-m(Q,) 


=0-14+1],-0+---+1_,-0+1-0=0. 


And because the Lebesgue sum is zero for any partition, the limit of all 
such is zero as well. That is, I, d(x)dx = 0. 


The fact that Dirichlet’s function is everywhere discontinuous ren- 
dered it nonintegrable for Riemann, but such universal discontinuity was 
of no consequence for Lebesgue. Here was indisputable mathematical 
progress. 


Theorem 3: If f and g are bounded, measurable functions on [a, b] and 
b b 
f () = g(x) almost everywhere, then [ f(x)dx = [ g(x)dx. 


This result says that changing the values of a measurable function on 
a set of measure zero has no effect on the value of its Lebesgue integral. 
For Riemann, we can change the function's value at finitely many points 
without altering the integral, but once we tamper with an infinitude of 
points, all bets are off. By contrast, Lebesgue’s integral is sufficiently 
tamper-proof that we can modify the function on an infinite set of zero 
measure yet leave the integral—and the integrability—intact. 

To see this theorem in action, we revisit Dirichlet’s function d and the 
ruler function R on [0, 1] and form a trio by introducing g(x) = 0 for all x 
in [0, 1]. The three functions d, R, and g are certainly not identical, for 
they differ at rational points in the unit interval. But such differences are 
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trivial from a measure-theoretic standpoint because m{x|d(x) # g@)} = 
m{x|R@&) # g(x)} = m(Q,) = 0. In other words, Dirichlet’s function and 
the ruler function equal zero almost everywhere. It follows from Theorem 3 


1 1 1 1 
that I, d(x)dx = f R(x)dx = I; g(x)dx = i} 0-dx =0, as we have seen 
previously. 
Yet another important result from Lebesgue’s thesis is now called the 
bounded convergence theorem [17]. He proved that, under very mild 


conditions, it is permissible to interchange limits and the integral. This 
was a major advance over Riemann’s theory. 


Theorem 4 (Lebesgues bounded convergence theorem): If {f,} is a 
sequence of measurable functions on [a, b] that is uniformly bounded by 
the number M>0 (ie., Lf, |S M for all x in [a, b] and for all k = 1) 


and if f(x) = lim f,(x) is the pointwise limit, then lim | fpdx = 
k— oo oo 84 
b b 
[food = ff im fa) fe 


Si les fonctions mesurables f,(2), bornées dans leur en- 
semble, c’est-a-dire quels que soient net x, ont une limite f(x), 
Vintégrale de fn(x) tend vers celle de f(x). 


En effet, nous savons que f(x) est intégrable; évaluons 


ab , 
| [f(x2) —fn(@)] dx. 


Si on a toujours | f,(7)|< M et si f —f, est inférieure a ¢ 
dans E,, f — fy, étant inférieure a la fonction égale a ¢ dans E,, et 
a M dans C(E,,), a une intégrale au plus égale en module a 


em(E,) + M m[C(E,)|- 


a ‘er I 
Mais ¢ est quelconque, et m{C(E,,)] tend vers zéro avec — parce 
q que, nP 


qu'il n’y a aucun point commun 4 tous les E,, donc 
b 
[ S-fa) ae 
Ja 


tend vers zéro. La propriété est démontrée ('). 


Lebesgue’s proof of the bounded convergence theorem (1904) 


218 CHAPTER 14 


1 
We can use this to launch our third attack upon f d(x )dx. Earlier, we 


introduced a sequence of functions {@,} on [0, 1] for which lim $,(x) = 
S00 
d(x),as seen in (1). Clearly, |@,(x)| <1 for all x and all k, so this is a 


uniformly bounded family, and because each @, is zero except at k 
points, we know that each function is measurable with [ ;,(x)dx = 0. By 


Lebesgue’s bounded convergence theorem, we conclude yet again that 


[, acoax “ {| im 0,00 fax 


= li . = l dx = 
= lim J, o,,(x)dx = fo: =0. 


There is time for one last flourish. We recall that Volterra had discov- 
ered a pathological function with a bounded, nonintegrable derivative. Of 
course, in Volterra’ day, “nonintegrable” meant “non-Riemann-integrable.” 

By adopting Lebesgue’s alternative, however, the pathology disap- 
pears. For if F is differentiable with bounded derivative F’, then the 


b 
Lebesgue integral [ F’(x)dx must exist because, as we saw in chapter 13, 


F’ belongs to Baire class 0 or Baire class 1. This is sufficient to make it 
Lebesgue integrable. 

Better yet, the bounded convergence theorem allowed Lebesgue to 
prove the following [18]. 


Theorem 5: If F is differentiable on [a, b] with bounded derivative, then 


[Poa = F(b) - Fla). 


Here, back in all its original glory, is the fundamental theorem of cal- 
culus. With Lebesgue’s integral, there was no longer the need to attach 
restrictive conditions to the derivative, for example, a requirement that it 
be continuous, in order for the fundamental theorem to hold. In a sense, 
then, Lebesgue restored this central result of calculus to a state as “natural” 
as it was in the era of Newton and Leibniz. 

In closing, we acknowledge that many, many technicalities have been 
glossed over in this brief introduction to Lebesgue’s work. A complete 
development of his ideas would require a significant investment of time 
and space, which makes it all the more amazing that these ideas are taken 
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from his doctoral thesis! It is no wonder that the dissertation stands in a 
class by itself. 

We end with a final observation from Lebesgue. In the preface of his 
great 1904 work, he conceded that his theorems carry us from “nice” 
functions into a more complicated realm, yet it is necessary to inhabit this 
realm in order to solve simply stated problems of historic interest. “It is for 
the resolution of these problems,” he wrote, “and not for the love of com- 
plications, that I introduce in this book a definition of the integral more 
general than that of Riemann and containing it as a particular case” [19]. 

To resolve historic problems rather than to complicate life: a worthy 
principle that guided Henri Lebesgue on his mathematical journey. 


Afterword 


oF visit to the calculus gallery has come to an end. 

Along the way, we have considered thirteen mathematicians whose 
careers fall into three separate periods or, at the risk of overdoing the anal- 
ogy, into three separate wings. 

First came the Early Wing, which featured work of the creators, New- 
ton and Leibniz, as well as of their immediate followers: the Bernoulli 
Brothers and Euler. From there we visited what might be called the Classi- 
cal Wing, with a large hall devoted to Cauchy and sizable rooms for Rie- 
mann, Liouville, and Weierstrass, scholars who supplied the calculus with 
extraordinary mathematical rigor. Finally, we entered the Modern Wing of 
Cantor, Volterra, Baire, and Lebesgue, who fused the precision of the clas- 
sicists and the bold ideas of set theory. 

Clearly, the calculus on display at tour’s end was different from that 
with which it began. Mathematicians had gone from curves to functions, 
from geometry to algebra, and from intuition to cold, clear logic. The 
result was a subject far more sophisticated, and far more challenging, than 
its originators could have anticipated. 

Yet central ideas at the outset remained central ideas at the end. As 
the book unfolded, we witnessed a continuing conversation among those 
mathematicians who refined the subject over two and a half centuries. In 
a very real sense, these creators were addressing the same issues, albeit 
in increasingly more complicated ways. For instance, we saw Newton 
expand binomials into infinite series in 1669 and Cauchy provide con- 
vergence criteria for such series in 1827. We saw Euler calculate basic 
differentials in 1755 and Baire identify the continuity properties of 
derivatives in 1899. And we saw Leibniz apply his transmutation theo- 
rem to find areas in 1691 and Lebesgue develop his beautiful theory of 
the integral in 1904. Mathematical echoes resounded from one era to 
the next, and even as things changed, the fundamental issues of calculus 
remained. 

Our book ended with Lebesgue’s thesis, but no one should conclude 
that research in analysis ended there as well. On the contrary, his work 
revitalized the subject, which has grown and developed over the past hun- 
dred years and remains a bulwark of mathematics up to the present day. 
That story, and the new masters who emerged in the process, must remain 
for another time. 
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AFTERWORD 22 


We conclude as we began, with an observation from the great twenti- 
eth century mathematician John von Neumann. Because of achievements 
like those we have seen, von Neumann regarded calculus as the epitome 
of precise reasoning. His accolades, amply supported by the results of this 
book, will serve as the last word: 


I think it [the calculus] defines more unequivocally than anything 
else the inception of modern mathematics, and the system of 
mathematical analysis, which is its logical development, still con- 
stitutes the greatest technical advance in exact thinking. [1] 
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